Validating OS-compatibility for locally-run LLMs using Ollama with CI/CD matrix workflows
Senior NLP Researcher

Large Language Models (LLMs) are becoming increasingly accessible, with regular adoption of open-source models and the growing ecosystem of tools for running them locally. Compact versions are now able to run on consumer-grade hardware, so developers are using LLMs on personal devices like Linux workstations, macOS laptops, or even Windows machines. As this trend grows, so does the need to ensure that your LLM-powered applications run reliably across all major operating systems.
For most developers, it is easy to overlook OS compatibility until something breaks on a system you did not test. Maybe a shell command works fine on Linux but fails on Windows. Or a dependency installs easily on macOS but needs extra configuration on Fedora. Manually verifying compatibility across Linux, macOS, and Windows every time you make a change is not scalable. It would require maintaining multiple test environments, repeating installation steps, and running test cases on each system; an impractical and error-prone process for any development team.
Continuous Integration and Continuous Deployment (CI/CD) pipelines can help manage these processes. With the right setup, you can automate these OS-level checks and catch platform-specific issues before your users do. In this tutorial, you will learn how to set up an OS compatibility testing workflow using CircleCI matrix jobs. These jobs define a single test process and run it in parallel across multiple operating systems.
To make the process concrete, you will use Ollama, a popular tool for running LLMs locally via a CLI and HTTP API. You will install Ollama on each OS, start the service, and run a set of Python tests to make sure it all works as expected. Once your pipeline is in place, every change you push will automatically be tested across platforms.
Prerequisites
For this tutorial, you need to set up a Python development environment on your machine. You also need a CircleCI account to automate the testing of your LLM model. Here’s what you will need:
Introduction to the Ollama and installation process
Ollama is a tool for running and managing large language models (LLMs) locally via a simple CLI and HTTP API. It enables developers to run large models on local systems without needing an internet connection or cloud resources. Ollama provides a streamlined installation process for multiple operating systems and makes it easy to pull, serve, and interact with models locally.
It is compatible across Linux, MacOS and Windows making it a good choice for this tutorial. However, you can easily adapt it to your own LLM application and code base for testing across each OS.
Rather than manually installing Ollama on each local machine, you will use CircleCI’s configuration file to automate the installation process across different operating systems. In the CircleCI configuration file (config.yml), you will specify the installation steps for Ollama on each OS (Linux, macOS, and Windows) using their respective package managers. This ensures that every time a code change is made, the installation is automatically verified on all target systems.
On Linux, install Ollama using the curl command and a simple script:
curl -fsSL https://ollama.com/install.sh | sh
For macOS, install Ollama using the Homebrew package manager:
brew install ollama
On Windows, install Ollama using Chocolatey through Powershell:
choco install ollama
These steps will be encapsulated within CircleCI matrix jobs, running the installation and testing process in parallel across all three OS environments. This automation ensures that your LLM application’s installation and functionality are verified across platforms, making manual testing unnecessary.
Once Ollama is installed, you can begin interacting with it directly from the command line. Serving Ollama starts a server on port 11434
, which serves an API to use an LLM locally. Here’s how you can pull and use a model, using smollm2
as an example:
For Unix-based systems, use this command:
ollama serve
ollama pull smollm2:135m
curl -X POST http://localhost:11434/api/generate \
-H "Content-Type: application/json" \
-d '{"model": "smollm2:135m", "prompt": "Say hello", "stream": false}'
For Windows, perform the same steps using Powershell syntax:
Start-Process -NoNewWindow -FilePath "ollama" -ArgumentList "serve"
ollama pull smollm2:135m
$body = @{
model = "smollm2:135m"
prompt = "Say hello"
stream = $false
} | ConvertTo-Json -Compress
Invoke-RestMethod -Uri http://localhost:11434/api/generate `
-Method Post `
-ContentType "application/json" `
-Body $body `
-OutFile "ollama-test-output.json"
The model returns a generated response based on the provided prompt. This interaction is the core of how you will use Ollama for testing and inference locally. Here is a sample response from the Unix-based curl request to local Ollama server:
[GIN] 2025/05/07 - 22:32:22 | 200 | 350.316167ms | 127.0.0.1 | POST "/api/generate"
{"model":"smollm2:135m","created_at":"2025-05-07T17:32:22.57243Z","response":"How can I assist you?","done":true,"done_reason":"stop","context":[1,9690,198,2683,359,253,5356,5646,11173,3365,3511,308,34519,28,7018,411,407,19712,8182,2,198,1,4093,198,36218,33662,2,198,1,520,9531,198,2020,416,339,4237,346,47],"total_duration":346719041,"load_duration":11733750,"prompt_eval_count":32,"prompt_eval_duration":294226000,"eval_count":7,"eval_duration":38480000
By using Ollama’s CLI, you can easily test the functionality of the models on each operating system in your CircleCI pipeline. Whether you are on Linux, macOS, or Windows, Ollama provides a unified way to interact with LLMs locally, ensuring that your development and testing process is seamless across platforms. You can refer to the Ollama documentation for its core functionality.
Python unit-tests for verifying Ollama functionality
To verify Ollama’s functionality, you can use Python unit tests, which help ensure that your LLM application behaves as expected across different operating systems. This also allows you to automate compatibility checks in your CI/CD pipeline.
The test you will create is a template that can be adapted for your own LLM application.
Note: You can modify the model name or add OS-specific checks using Python’s platform module. For example, you could write OS-dependent tests by adding conditions based on the detected operating system. These unit tests ensure that Ollama (or any LLM application) functions as expected across platforms and they can be integrated into your CircleCI pipeline for automated testing.
Start by creating a new Python project. Execute the below commands in the terminal to create a new folder for your project and navigate to it:
mkdir OllamaOSCompatiblity-MatrixJob
cd OllamaOSCompatiblity-MatrixJob
You will need the requests library to interact with the Ollama API through Python and verify that the model responds correctly. You can install it using Python’s default package manager pip. Create a requirements.txt
file and store each required package in it. Here is the sample requirements file for this project:
# File Name: requirements.txt
requests
Next, install all the packages collectively using pip. Always use a virtual environment for Python projects to avoid dependency clashes and version conflicts. Learn more about Python virtual environments.
For Unix-based systems, use pip to create a new virtual environment, activate it, and install the Python dependencies:
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
Note: You may need separate commands to create a Python virtual environment for Windows systems.
Next, create a Python file and use the unittest
module to test Ollama functionality. The Python unit test file starts the Ollama server in the background and ensures that it is ready to process requests by pulling the smollm2:135m
model.
It includes two main tests:
- One checks if the server responds correctly when fetching available models.
- The other validates the generation of text from a sample prompt.
When the tests are complete, the server is gracefully shut down, using platform-specific commands for Windows and Unix-based systems. You can easily modify this test setup for your own LLM application by updating the model or prompt, or by adding new test cases to verify its functionality across different operating systems.
# File Name; ollama-test.py
import unittest
import requests
import subprocess
import platform
import time
import os
import signal
# Ollama API endpoint and model name (can be overridden with env var)
OLLAMA_URL = "http://localhost:11434"
MODEL_NAME = os.getenv("OLLAMA_MODEL", "smollm2:135m")
class TestOllamaServer(unittest.TestCase):
@classmethod
def setUpClass(cls):
# Launch the Ollama server as a background subprocess
cls.ollama_proc = subprocess.Popen(["ollama", "serve"])
time.sleep(10) # wait for server to boot
# Pull the specified model before running tests
subprocess.run(["ollama", "pull", MODEL_NAME], check=True)
@classmethod
def tearDownClass(cls):
# Gracefully shut down the Ollama server
if platform.system() == "Windows":
cls.ollama_proc.terminate() # Use terminate() for Windows
else:
cls.ollama_proc.send_signal(signal.SIGINT) # Use SIGINT for Unix-based systems
cls.ollama_proc.wait()
def test_server_responds(self):
response = requests.get(f"{OLLAMA_URL}/api/tags")
# -- Assertions for getting a list of available models -- #
self.assertEqual(response.status_code, 200)
self.assertIn("models", response.json())
def test_generation(self):
# -- Sample dummy request -- #
payload = {
"model": MODEL_NAME,
"prompt": "Say hello",
"stream": False
}
response = requests.post(f"{OLLAMA_URL}/api/generate", json=payload)
# -- Assertions for obtaining a response back from the model -- #
self.assertEqual(response.status_code, 200)
self.assertIn("response", response.json())
if __name__ == "__main__":
unittest.main()
To execute the tests locally, run this command:
python ollama-test.py
This runs the tests locally and checks that Ollama works on your local system. Here is an expected sample response for successful execution of the unit tests:
[GIN] 2025/05/07 - 22:46:12 | 200 | 2.774334ms | 127.0.0.1 | HEAD "/"
pulling manifest ⠦ [GIN] 2025/05/07 - 22:46:15 | 200 | 2.72797875s | 127.0.0.1 | POST "/api/pull"
pulling manifest
---OUTPUT TRUNCATED---
[GIN] 2025/05/07 - 22:46:16 | 200 | 854.24525ms | 127.0.0.1 | POST "/api/generate"
.[GIN] 2025/05/07 - 22:46:16 | 200 | 3.928042ms | 127.0.0.1 | GET "/api/tags"
.
----------------------------------------------------------------------
Ran 2 tests in 13.662s
OK
Matrix workflows to verify functionality across multiple operating systems
In the CircleCI config YAML, matrix jobs are defined to run the tests for each operating system separately. The matrix setup lets you define a single test job, which CircleCI runs across all specified OS environments in parallel. For each OS, there are distinct commands that handle installation and setup.
- For Linux, use the
install_ollama_linux
command, which installs necessary dependencies and Ollama. This command uses system package managers like apt or dnf depending on the Linux distribution. - For macOS, use the
install_ollama_macos
command that leverages Homebrew to install Ollama. - On Windows, use the
install_ollama_windows
command to install Ollama via Chocolatey.
After setting up Ollama on each OS, the CircleCI job first tests the installation using the CLI and then runs the Python test script you defined earlier. To keep the test lightweight, you will pull the smollm2:135m
model because its small size (~300MB) reduces the compute requirements.
The script performs sample generations using both CURL and Python unit tests. This ensures that Ollama functions correctly on each OS by verifying that the server is running and can generate responses. This setup results in an automated CI pipeline that tests your code for OS-specific compatibility across Linux, macOS, and Windows.
# File Name: .circleci/config.yml
version: 2.1
# ----------- Executors define environment per OS -----------
executors:
ubuntu-executor:
docker:
- image: cimg/python:3.10 # Usual Linux machine running Ubuntu
working_directory: ~/ollama-tests
fedora-executor:
docker:
- image: fedora:latest # Uses Fedora's official Docker image
working_directory: ~/ollama-tests
macos-executor:
macos:
xcode: "14.2.0" # Used Silicon-based M1.Pro Macbook
working_directory: ~/ollama-tests
windows-executor:
machine:
image: windows-server-2019-vs2019:current # Uses a Windows VM machine.
shell: 'powershell.exe -ExecutionPolicy Bypass'
resource_class: 'windows.medium'
# ----------- Shared reusable commands for OS-specific setup and testing -----------
commands:
install_ollama_linux:
steps:
- run:
name: Install Python & Ollama (Linux)
command: |
if command -v apt-get &> /dev/null; then
sudo apt-get update && sudo apt-get install -y python3-pip curl jq
elif command -v dnf &> /dev/null; then
dnf install -y python3-pip curl jq gawk
fi
curl -fsSL https://ollama.com/install.sh | sh
echo 'export PATH="$HOME/.ollama/bin:$PATH"' >> $BASH_ENV
source $BASH_ENV
install_ollama_macos:
steps:
- run:
name: Install Ollama (macOS)
command: |
brew update
brew install ollama
# Homebrew installs into Linuxbrew path; persist it in BASH_ENV
echo 'export PATH="/home/linuxbrew/.linuxbrew/bin:$PATH"' >> $BASH_ENV
source $BASH_ENV
ollama --version
install_ollama_windows:
steps:
- run:
name: Install Ollama (Windows)
command: |
# Install using Chocolatey, then refresh shell environment
choco install ollama
refreshenv
ollama --version
serve_and_test_unix:
steps:
- run:
name: Serve Ollama & Test (Unix)
# -- Basic Ollama testing from cli arguments and commands -- #
command: |
ollama serve &
sleep 10
ollama pull smollm2:135m
curl -s http://localhost:11434/api/tags || (echo "Ollama is not responding!" && exit 1)
curl -X POST http://localhost:11434/api/generate \
-H "Content-Type: application/json" \
-d '{"model": "smollm2:135m", "prompt": "Say hello", "stream": false}' \
| tee ollama-test-output.json
echo "Ollama response:"
cat ollama-test-output.json
serve_and_test_windows:
steps:
- run:
name: Serve Ollama & Test (Windows)
# -- Windows-specific cli testing with Powershell syntax -- #
command: |
# Start Ollama server in background
Start-Process -NoNewWindow -FilePath "ollama" -ArgumentList "serve"
Start-Sleep -Seconds 10
# Pull small model and check server readiness
ollama pull smollm2:135m
try {
$tags = Invoke-RestMethod -Uri http://localhost:11434/api/tags -Method Get
Write-Host "Ollama tags:" $tags
} catch {
Write-Host "Ollama is not responding!"
exit 1
}
# Send generate request to validate inference
$body = @{
model = "smollm2:135m"
prompt = "Say hello"
stream = $false
} | ConvertTo-Json -Compress
Invoke-RestMethod -Uri http://localhost:11434/api/generate `
-Method Post `
-ContentType "application/json" `
-Body $body `
-OutFile "ollama-test-output.json"
Get-Content ollama-test-output.json
# ----------- Matrix job for each OS -----------
jobs:
test:
parameters:
os:
type: string
executor:
name: << parameters.os >>-executor
steps:
- checkout
# Linux setup and install
- when:
condition:
or:
- equal: [<< parameters.os >>, "ubuntu"]
- equal: [<< parameters.os >>, "fedora"]
steps:
- install_ollama_linux
# macOS setup
- when:
condition:
equal: [<< parameters.os >>, "macos"]
steps:
- install_ollama_macos
# Windows setup
- when:
condition:
equal: [<< parameters.os >>, "windows"]
steps:
- install_ollama_windows
# Windows-specific Ollama serve + test
- when:
condition:
equal: [<< parameters.os >>, "windows"]
steps:
- serve_and_test_windows
# Unix systems serve + test
- when:
condition:
or:
- equal: [<< parameters.os >>, "ubuntu"]
- equal: [<< parameters.os >>, "fedora"]
- equal: [<< parameters.os >>, "macos"]
steps:
- serve_and_test_unix
- run:
name: Run Python test
command: |
pip install requests
python ollama-test.py
# ----------- Matrix workflow across all target OSes -----------
workflows:
test-matrix:
jobs:
- test:
matrix:
parameters: # Defined parameters used with the Matrix Job
os: ["fedora", "ubuntu", "macos", "windows"]
Setting up the project on CircleCI
The complete code for this project is available on my GitHub account.
To execute the defined workflow, connect the code repository to your CircleCI account. Go to the Projects tab on your CircleCI dashboard and create a new project. This redirects you to a page where you can set up your workflow.
If you haven’t already connected your GitHub account to CircleCI, you’ll need to do that first. Once connected, select the relevant repository for this project.
CircleCI will automatically detect the config.yml
file that you’ve defined in the project. You can proceed with the configuration and set up the necessary triggers to control when your pipeline will execute. For this example, I will configure the pipeline to run on pull requests merged on the default branch which will run the CI pipeline on every stable change.
While the pipeline will execute automatically based on the triggers you set, you can also manually trigger the pipeline with modified parameters. When triggered, you can check the pipeline’s progress and confirm successful execution. If everything is set up correctly, the process should complete as expected, and you can review the result in your CircleCI dashboard.
Conclusion
In this tutorial, you explored how to build an automated CI/CD pipeline using CircleCI matrix workflows to ensure OS-level compatibility for local LLM applications. By using Ollama as a practical example, you learned how to:
- Install the tool on Linux, macOS, and Windows using OS-specific commands.
- Verify its functionality through Python unit tests.
- Run those tests in parallel across all major platforms.
This approach helps catch OS-specific issues early, making your codebase more robust and saving hours of manual testing.
You can use the workflow demonstrated here as a template for any LLM project that needs to support users on different operating systems. You can easily extend the setup by replacing Ollama with your own CLI tool or API-based application and modifying the unit tests to reflect your project’s functionality. If you’re supporting multiple models, you might also add tests to verify their loading and inference behavior across environments.
You can extend on this implementation and introduce improvements like testing against more OS variants, such as different Linux distributions (e.g., Ubuntu, Fedora, Alpine), or integrating GPU compatibility checks if your application depends on hardware acceleration. The same matrix testing pattern can be applied to other local LLM runtimes like vLLM, llama.cpp, or LM Studio, allowing you to build a complete test suite that ensures your tools work reliably for every end user.