Quick and Dirty Guide: Getting started with open source AI
An image created by condensing the whole of human creative output into an image generation model
April 8, 2025
Here’s a quick and dirty guide to getting a local coding assistant up and running using the current state-of-the-art workflow for running large language models locally.
I’m doing this on a MacBook with an M3 chip and 24GB of RAM, but any M-series Mac (M1 through M4) will do. If you’re on a PC with a GPU and solid VRAM, the same steps apply — just adjusted for your setup.
Rule of thumb: Choose a model about 20% smaller in parameter size (in billions) than your available RAM/VRAM. With 24GB RAM, models in the 13B-18B parameter range typically run smoothly without straining your system.
For this example, I’ll be using the recently released DeepCoder-14B-Preview model from AgenticA — a compact, high-performance model built specifically for coding tasks. It’s a great fit for local inference, and it keeps up with the current frontier models while being lightweight enough to run on high-end consumer hardware.
Let’s get into it.
WHY RUN AI MODELS LOCALLY?
Privacy: Processing data on your local machine ensures that
sensitive information remains secure and isn't transmitted to
external servers.
Offline Access: Local models allow you to work without an
internet connection, ensuring uninterrupted productivity.
Performance: Leveraging your device's hardware can lead to
faster processing times, especially with optimized configurations.
PREREQUISITES
Before proceeding, ensure your system meets the following requirements:
Hardware:
- Mac: M-series chip (M1, M2, M3, or M4) with at least 16GB of RAM.
For larger models, 32GB or more is recommended.
- PC: A GPU with substantial VRAM (8GB or more) to handle larger
models effectively.
Operating System:
- Mac: macOS 11 Big Sur or later.
- PC: Windows 10 or later, or a compatible Linux distribution.
METHOD 1: USING LM STUDIO
LM Studio is a user-friendly application that simplifies the process
of discovering, downloading, and interacting with open-source AI
models locally.
Download and Install LM Studio:
- Visit the LM Studio website and download the installer compatible
with your operating system.
- For Mac users, open the downloaded file and drag the LM Studio
application into your Applications folder.
- For PC users, run the installer and follow the on-screen
instructions.
Optimize LM Studio for Apple Silicon (Mac Users):
- Launch LM Studio.
- Navigate to the 'Chat' tab in the left sidebar.
- Click on 'Settings'.
- Change the system prompt to "Default LM Studio macOS".
- Confirm the changes by clicking "Accept New System Prompt".
This configuration ensures that LM Studio utilizes the Apple Metal GPU,
enhancing the performance of large language models.
Download and Load a Model:
- In LM Studio, click on the search icon.
- Enter the name of the model you wish to use, such as
"DeepCoder-14B-Preview".
- Click the download button next to the model.
- Once the download is complete, the model will appear in your
list of available models. Click on it to load.
Interact with the Model:
- Navigate to the 'Chat' interface within LM Studio.
- Select the loaded model from the dropdown menu.
- You can now input prompts and receive responses directly from
the model.
METHOD 2: USING OLLAMA
Ollama is a command-line tool designed to facilitate the running
of large language models locally.
Install Ollama:
For Mac users:
- Open Terminal.
- Install Homebrew if it's not already installed:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Install Ollama using Homebrew:
brew install ollama
For PC users:
- Visit the Ollama download page and download the installer
for your operating system.
- Run the installer and follow the on-screen instructions.
Run a Model with Ollama:
- Open Terminal (Mac) or Command Prompt (PC).
- To run the DeepSeek-R1-Distill-Qwen-14B model, execute:
ollama run deepseek-r1:14b
- The first time you run this command, Ollama will download the model,
which may take some time depending on your internet speed.
- Once the model is downloaded, you can interact with it directly
through the command line.
PERFORMANCE CONSIDERATIONS
Model Size and Hardware Capabilities:
- Larger models require more RAM and VRAM. Ensure your system meets
the recommended specifications for the model you intend to use.
- For instance, running a 14B parameter model is feasible on a Mac
with 32GB of RAM, but performance may vary.
Quantization:
- Utilizing quantized models (e.g., 4-bit versions) can reduce
memory usage and improve performance.
- Quantized models are available for various sizes and can be
selected based on your hardware capabilities.
GPU Acceleration:
- Leveraging GPU acceleration can significantly enhance performance.
Ensure that your system's GPU drivers are up to date and that the
software is configured to utilize GPU resources effectively.
GET HELP