Quick and Dirty Guide: Getting started with open source AI

An image created by condensing the whole of human creative output into an image generation model

April 8, 2025

Here’s a quick and dirty guide to getting a local coding assistant up and running using the current state-of-the-art workflow for running large language models locally.

I’m doing this on a MacBook with an M3 chip and 24GB of RAM, but any M-series Mac (M1 through M4) will do. If you’re on a PC with a GPU and solid VRAM, the same steps apply — just adjusted for your setup.

Rule of thumb: Choose a model about 20% smaller in parameter size (in billions) than your available RAM/VRAM. With 24GB RAM, models in the 13B-18B parameter range typically run smoothly without straining your system.

For this example, I’ll be using the recently released DeepCoder-14B-Preview model from AgenticA — a compact, high-performance model built specifically for coding tasks. It’s a great fit for local inference, and it keeps up with the current frontier models while being lightweight enough to run on high-end consumer hardware.

Let’s get into it.

WHY RUN AI MODELS LOCALLY?

Privacy: Processing data on your local machine ensures that sensitive information remains secure and isn't transmitted to external servers.

Offline Access: Local models allow you to work without an internet connection, ensuring uninterrupted productivity.

Performance: Leveraging your device's hardware can lead to faster processing times, especially with optimized configurations.

PREREQUISITES

Before proceeding, ensure your system meets the following requirements:

Hardware:

Mac: M-series chip (M1, M2, M3, or M4) with at least 16GB of RAM. For larger models, 32GB or more is recommended.
PC: A GPU with substantial VRAM (8GB or more) to handle larger models effectively.

Operating System:

Mac: macOS 11 Big Sur or later.
PC: Windows 10 or later, or a compatible Linux distribution.

METHOD 1: USING LM STUDIO

LM Studio is a user-friendly application that simplifies the process of discovering, downloading, and interacting with open-source AI models locally.

Download and Install LM Studio:

Visit the LM Studio website and download the installer compatible with your operating system.
For Mac users, open the downloaded file and drag the LM Studio application into your Applications folder.
For PC users, run the installer and follow the on-screen instructions.

Optimize LM Studio for Apple Silicon (Mac Users):

Launch LM Studio.
Navigate to the 'Chat' tab in the left sidebar.
Click on 'Settings'.
Change the system prompt to "Default LM Studio macOS".
Confirm the changes by clicking "Accept New System Prompt".

This configuration ensures that LM Studio utilizes the Apple Metal GPU, enhancing the performance of large language models.

Download and Load a Model:

In LM Studio, click on the search icon.
Enter the name of the model you wish to use, such as "DeepCoder-14B-Preview".
Click the download button next to the model.
Once the download is complete, the model will appear in your list of available models. Click on it to load.

Interact with the Model:

Navigate to the 'Chat' interface within LM Studio.
Select the loaded model from the dropdown menu.
You can now input prompts and receive responses directly from the model.

METHOD 2: USING OLLAMA

Ollama is a command-line tool designed to facilitate the running of large language models locally.

Install Ollama:

For Mac users:

Open Terminal.
Install Homebrew if it's not already installed:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Install Ollama using Homebrew:

brew install ollama

For PC users:

Visit the Ollama download page and download the installer for your operating system.
Run the installer and follow the on-screen instructions.

Run a Model with Ollama:

Open Terminal (Mac) or Command Prompt (PC).
To run the DeepSeek-R1-Distill-Qwen-14B model, execute:

ollama run deepseek-r1:14b

The first time you run this command, Ollama will download the model, which may take some time depending on your internet speed.
Once the model is downloaded, you can interact with it directly through the command line.

PERFORMANCE CONSIDERATIONS

Model Size and Hardware Capabilities:

Larger models require more RAM and VRAM. Ensure your system meets the recommended specifications for the model you intend to use.
For instance, running a 14B parameter model is feasible on a Mac with 32GB of RAM, but performance may vary.

Quantization:

Utilizing quantized models (e.g., 4-bit versions) can reduce memory usage and improve performance.
Quantized models are available for various sizes and can be selected based on your hardware capabilities.

GPU Acceleration:

Leveraging GPU acceleration can significantly enhance performance. Ensure that your system's GPU drivers are up to date and that the software is configured to utilize GPU resources effectively.

GET HELP