Enhancing AI model experimentation with multiple CI/CD pipelines

Experimentation is crucial for creating successful AI and machine learning (ML) models, but managing many experiments can be really challenging, especially as models get more complex. One way to make this easier is by using Continuous Integration and Continuous Deployment (CI/CD) pipelines. These pipelines automate things like training, testing, and launching models, which helps you run experiments more quickly and with fewer mistakes.

In this article, you’ll learn how to use multiple CI/CD pipelines to speed up and automate your AI model experiments. You’ll find out how running different pipelines at the same time can help you test various versions of a model all at once, which saves time and helps you test more effectively. The steps explained here will not only make your work more efficient but also ensure that your experiments can be repeated and expanded easily.

Prerequisites

Before you start, make sure these items are in place:

A CircleCI account
A GitHub account
Basic knowledge of CI/CD
Basic knowledge of AI/ML concepts

Why multiple CI/CD pipelines for AI models?

Handling the stages of AI development (preparing data, training models, testing, and deploying) within one continuous integration/continuous deployment (CI/CD) pipeline can slow things down and create problems. Using multiple CI/CD pipelines can help solve these issues with the following benefits:

Better efficiency: Using separate pipelines for different tasks, such as data preparation, model training, and evaluation, allows these tasks to be done at the same time, which speeds up the overall process.
Safer testing: Teams can try out different versions of their models without interfering with each other’s work or making mistakes.
Better use of resources: Multiple sequential pipelines enable efficient resource allocation by focusing on one task at a time. This structured approach provides better control and consistency in execution.

Setting up a CI/CD pipeline for AI model experimentation

This tutorial uses a house price prediction project as an example. The project includes scripts for data preparation, training, and evaluation, along with a sample dataset and configuration files.

To get started, clone the repository:

git clone https://github.com/CIRCLECI-GWP/house-price-prediction
cd house-price-prediction

You’ll be in the main branch of the repository.

Before running the scripts, you’ll need to install the required Python libraries. Use a virtual environment to keep dependencies isolated:

# Create a virtual environment
python -m venv venv

# Activate the virtual environment
# On Windows
venv\Scripts\activate
# On macOS/Linux
source venv/bin/activate

Next, install the dependencies listed in the requirements.txt file:

pip install -r requirements.txt

The repository includes the following key files:

preprocess.py: Cleans and prepares the dataset.
train.py: Trains the model using the preprocessed data.
evaluate.py: Evaluates model performance on test data.
house_prices.csv: Contains the sample dataset for house price prediction.

These files will serve as the backbone of the CI/CD pipeline. The scripts will be called in different stages of the pipeline, automating the process from data preparation to model evaluation.

Training the model (`train.py`)

The train.py script loads preprocessed data, selects a model based on the provided argument, trains it, and saves it as a .joblib file.

import argparse
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.svm import SVR
from sklearn.neighbors import KNeighborsRegressor
import joblib

def train_model(model_type):
    # Load preprocessed data
    X_train = pd.read_csv('X_train.csv')
    y_train = pd.read_csv('y_train.csv')

    # Choose model
    if model_type == "A":
        model = LinearRegression()
        model_name = "model_A.joblib"
    elif model_type == "B":
        model = RandomForestRegressor(n_estimators=100, random_state=42)
        model_name = "model_B.joblib"
    elif model_type == "C":
        model = SVR()
        model_name = "model_C.joblib"
    elif model_type == "D":
        model = KNeighborsRegressor(n_neighbors=5)
        model_name = "model_D.joblib"
    else:
        raise ValueError("Invalid model type. Choose 'A', 'B', 'C' or 'D'.")

    # Train model
    model.fit(X_train, y_train)

    # Save trained model
    joblib.dump(model, model_name)
    print(f"{model_type} trained and saved as {model_name}")

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--model", type=str, required=True, help="Choose model: A, B, C, or D")
    args = parser.parse_args()
    train_model(args.model)

Evaluating the model (`evaluate.py`)

After training, we use evaluate.py to test the model’s performance. It loads the trained model, makes predictions on test data, and calculates the Mean Squared Error.

import argparse
import pandas as pd
from sklearn.metrics import mean_squared_error
import joblib

def evaluate_model(model_type):
    # Load preprocessed data
    X_test = pd.read_csv('X_test.csv')
    y_test = pd.read_csv('y_test.csv')

    # Load model
    model_name = f"model_{model_type}.joblib"
    model = joblib.load(model_name)

    # Predict and evaluate
    predictions = model.predict(X_test)
    mse = mean_squared_error(y_test, predictions)
    print(f"Model {model_type} - Mean Squared Error: {mse}")

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--model", type=str, required=True, help="Choose model: A, B, C, or D")
    args = parser.parse_args()
    evaluate_model(args.model)

To run the scripts in sequence locally:

Preprocess the data:
```
 python preprocess.py
```
Train the model:
```
 python train.py
```
Evaluate the model:
```
 python evaluate.py
```

To automate these steps and test with single and multiple pipelines, you with use a CI/CD pipeline. This ensures that data processing, model training, and evaluation run automatically whenever there are updates.

Pushing to GitHub

You’ll need to push the local repository that you cloned to your own GitHub account to proceed with the next steps.

First, remove the current remote by running this command:

git remote rm origin

Then push the code to your GitHub account. This linked blog post is an excellent resource you can refer to if you’re not familiar with GitHub.

CircleCI configuration

Once the online repository is set up, add CircleCI configuration in your local repository. When you connect the GitHub repository to CircleCI, CircleCI can monitor your repository for changes and start pipelines when needed.

Create a .circleci directory in the root of your project and add a config.yml file. This file defines the pipeline’s structure, specifying workflows, jobs, and steps. You can start by specifying the CircleCI version and any reusable configuration you need.

Here’s what the beginning of the file looks like:

version: 2.1

CircleCI jobs are the tasks that run in your pipeline. These tasks include installing necessary tools, executing scripts, and saving the results for future use. For the data preprocessing job, you will install the required tools and run a script called preprocess.py to process the raw data.

jobs:
  preprocess_data_A:
    docker:
      - image: cimg/python:3.8
    steps:
      - run:
          name: Cleanup Existing Files
          command: rm -rf /home/circleci/project && mkdir /home/circleci/project
      - checkout
      - run:
          name: Install Dependencies
          command: pip install -r requirements.txt
      - run:
          name: Preprocess Data
          command: python preprocess.py
      - persist_to_workspace:
          root: .
          paths:
            - X_train.csv
            - X_test.csv
            - y_train.csv
            - y_test.csv

In the initial preprocess data job, the workflow starts by using a Python Docker image to create an isolated environment for performing preprocessing tasks. Additional preprocessing steps for other models are also present in the GitHub repository.

First, it retrieves the code from the repository, installs the necessary packages from the requirements.txt file, and then runs the preprocess.py script to get the data ready. This script takes the raw data and splits it into training and testing datasets, such as X_train.csv, X_test.csv, y_train.csv, and y_test.csv. Once the data is processed, it is saved in the workspace so that it can be used in later jobs within the pipeline.

Running training and evaluation for each model

Once the data preparation is complete, we begin training each model. For every model, we first install the required dependencies, then run the training script train.py. We also use attach_workspace to bring in the preprocessed data from the previous job for each model. This process ensures that the data is available for training.

  train_model_A:
    docker:
      - image: cimg/python:3.8
    steps:
      - run:
          name: Cleanup Existing Files
          command: rm -rf /home/circleci/project && mkdir /home/circleci/project
      - checkout
      - attach_workspace:
          at: /home/circleci/project
      - run:
          name: Debug - List Files Before Training
          command: ls -la /home/circleci/project
      - run:
          name: Check if Data Exists
          command: |
            if [ ! -f /home/circleci/project/X_train.csv ]; then
              echo "ERROR: X_train.csv not found!"
              exit 1
            fi
      - run:
          name: Install Dependencies
          command: pip install -r requirements.txt
      - run:
          name: Train Model A
          command: python train.py --model A
      - persist_to_workspace:
          root: .
          paths:
            - model_A.joblib

The train_model job begins by cleaning up any existing files and ensuring that the environment is fresh. It then checks out the project code and attaches the workspace, bringing in the data from the preprocessing step.

Before proceeding with training, it checks if the required data files are present. If the necessary files are found, it installs the required dependencies and runs the train.py script to train the model. Once the model is trained, it is saved as model.joblib and persisted to the workspace so that it can be used in the next step, which is model evaluation.

The next job is to evaluate the trained model. Similar to the previous jobs, we attach the workspace to access the trained model and run an evaluation script evaluate.py.

  evaluate_model_A:
    docker:
      - image: cimg/python:3.8
    steps:
      - run:
          name: Cleanup Existing Files
          command: rm -rf /home/circleci/project && mkdir /home/circleci/project
      - checkout
      - attach_workspace:
          at: /home/circleci/project
      - run:
          name: Debug - List Files Before Evaluation
          command: ls -la /home/circleci/project
      - run:
          name: Check if Model Exists
          command: |
            if [ ! -f /home/circleci/project/model_A.joblib ]; then
              echo "ERROR: model_A.joblib not found!"
              exit 1
            fi
      - run:
          name: Install Dependencies
          command: pip install -r requirements.txt
      - run:
          name: Evaluate Model A
          command: python evaluate.py --model A

A workflow in a CI/CD pipeline defines the order in which jobs are executed. When setting up a machine learning pipeline, it’s essential to organize tasks logically. For a basic setup, you may want to run data preprocessing first, followed by model training, and finally model evaluation. This is a common sequence in machine learning workflows, ensuring that the data is ready before training, and that the model is evaluated after training.

Here’s how to define the workflow:

workflows:
  build-and-test:
    jobs:
      - preprocess_data_A
      - train_model_A:
          requires:
            - preprocess_data_A
      - evaluate_model_A:
          requires:
            - train_model_A

This pipeline is simple, where each job is dependent on the previous one. The preprocess_data_A job runs first, followed by train_model_A, and after that, evaluate_model_A runs. This sequential process ensures smooth execution and reduces the chances of errors. However, this approach can be slow for experimentation when working with multiple models or configurations because each job must wait for the previous one to finish.

Configuring the pipeline

Once the online repository is updated with the CircleCI config, log in to CircleCI. Remember, having a CircleCI account was mentioned as a prerequisite.

Once you are logged in, ensure your account is selected on the top-left corner.
Click Projects on the left sidebar.
On the next page, search for the name of your GitHub repository then click Set Up Project next to it.

Connecting a new project

On the modal that opens, enter the Git branch name that contains the CircleCI config file. Once acknowledged by the text .circleci/config.yml found on this branch you can click Start Building.

Selecting Git branch

The build will start. Moving forward, CircleCI will automatically trigger the workflow on each commit. You can also test the pipeline locally using the CircleCI CLI, by running circleci local execute <job-name>, to ensure everything works smoothly.

You can observe that the build is being processed, with each job running one after the other. If everything is configured properly and there are no problems with the setup or code, the build should finish successfully.

Testing a single pipeline

Multiple pipelines

For the test, doing it this way for many models will result in a longer time spent. Also, there’s risk of falling due each multiple pipeline depending on one another.

This pipeline is pretty straightforward and gets the job done, but it misses out on the perks of parallel processing and speedy experimentation. That’s where multiple pipelines can really help. By splitting tasks into separate pipelines for each model or setup, you can run preprocessing, training, and evaluation at the same time. This speeds up the whole experimentation process, making it easier to iterate and compare results quickly.

Let’s examine a revised version of the pipeline that supports parallel processing for each model. You’ll need to:

Copy the preprocess_data_A to preprocess_data_B, preprocess_data_C and preprocess_data_D
Copy the train_model_A to train_model_B, train_model_model_C and train_model_D. Note to also update the Train Model .. and persist_to_workspace commands to reflect the different models and generated files.
Copy the evaluate_model_A to evaluate_model_B, evaluate_model_model_C and evaluate_model_D. Note to also update the Check if Model Exists and Evaluate Model .. commands to reflect the different generated files and models.

Then the workflow will be like this:

workflows:
  build-and-test:
    jobs:
      - preprocess_data_A
      - preprocess_data_B
      - preprocess_data_C
      - preprocess_data_D
      - train_model_A:
          requires:
            - preprocess_data_A
      - train_model_B:
          requires:
            - preprocess_data_B
      - train_model_C:
          requires:
            - preprocess_data_C
      - train_model_D:
          requires:
            - preprocess_data_D
      - evaluate_model_A:
          requires:
            - train_model_A
      - evaluate_model_B:
          requires:
            - train_model_B
      - evaluate_model_C:
          requires:
            - train_model_C
      - evaluate_model_D:
          requires:
            - train_model_D

View the config file, at this stage, here.

In this updated setup, you’re running the preprocessing jobs for models A, B, C, and D all at once. After that, each model is trained in parallel. They are evaluated at the same time. Just a heads-up: each step relies on the one that came before it. For instance, you need to finish preprocessing data for model A before you can start training it as that’s the standard procedure for machine learning.

Commit these changes and push to GitHub. Expect a build like the image below.

Successful build

The image below shows that the pipeline has gone through all the necessary steps and everything is functioning correctly. This means that the integration process is finished, and your code is automatically tested and ready to be deployed. This also makes sure that any future changes are checked and deployed smoothly, without needing any manual work.

The cool part is that all these tasks run at the same time and in comparison to the single pipeline.

Handling dependencies and caching

To speed up your builds, you can cache Python dependencies. Caching helps avoid reinstalling the same packages every time a new job runs.

Here’s an example of how to add caching for Python dependencies. Replace each of the Install Dependencies steps:

      - run:
          name: Install Dependencies
          command: pip install -r requirements.txt

with:

      - restore_cache:
          keys:
            - python-dependencies-{{ checksum "requirements.txt" }}
      - run:
          name: Install Dependencies
          command: pip install -r requirements.txt
      - save_cache:
          paths:
            - ~/.cache/pip
          key: python-dependencies-{{ checksum "requirements.txt" }}

View the full CircleCI config here.

Commit these changes and push them to GitHub. Notice the shorter time it’s taken to complete the pipelines.

Successful build with cache

Analysis of results from multiple pipelines

Using multiple pipelines lets you tackle independent tasks at the same time, which really cuts down on the overall time spent running experiments. With just one pipeline, you have to finish one task before moving on to the next, but with multiple pipelines, you can run tasks like preprocessing and training different models all at once. This makes the most of your CI/CD system’s capabilities. It’s a big plus when you’re dealing with multiple models or fine-tuning hyper parameters, especially when quick iterations are key.

For instance, think about the test project where you’re running experiments for four different models. If you stick to a single pipeline, it could take longer to sequentially preprocess, train, and evaluate each one. However, if you go for multiple pipelines, all those processes can happen in parallel, drastically speeding up the experiment time. Here’s a practical demonstration in our simple project:

Single pipeline: Running 4 models sequentially (Preprocess -> Train -> Evaluate for each model) would take 4 times 2 minutes 14 seconds equated to almost 9 minutes.
Multiple pipelines: With parallel pipelines, these jobs ran simultaneously, reducing the total time to roughly 2 and a half minutes; and under 2 minutes with caching.

Conclusion

Bringing CI/CD into your AI/ML workflows isn’t just a nice-to-have anymore — it’s essential for teams wanting to keep up and work efficiently in data-driven software development. By automating tasks like data prep, model training, testing, and deployment, CI/CD pipelines let teams focus on being creative instead of getting bogged down in repetitive work. Plus, these workflows boost speed and keep things consistent and reliable, making it easier to tweak and enhance models while cutting down on mistakes.

Whether you’re launching a simple proof-of-concept or rolling out a full-scale system, adopting CI/CD best practices will simplify your development process and ensure your models stay current, accurate, and ready for whatever comes next. Sign up for your free CircleCI account and get started building more robust and reliable AI/ML applications today!

Site

Blog

Enhancing AI model experimentation with multiple CI/CD pipelines

Prerequisites

Why multiple CI/CD pipelines for AI models?

Setting up a CI/CD pipeline for AI model experimentation

Training the model (`train.py`)

Evaluating the model (`evaluate.py`)

Pushing to GitHub

CircleCI configuration

Running training and evaluation for each model

Configuring the pipeline

Multiple pipelines

Handling dependencies and caching

Analysis of results from multiple pipelines

Conclusion

Similar posts you may enjoy

Validating OS-compatibility for locally-run LLMs using Ollama with CI/CD matrix workflows

Understanding Playwright test hooks in the CI context (JavaScript) – A complete tutorial

Kubernetes sidecar deployment using CircleCI

Site

Blog

Prerequisites

Why multiple CI/CD pipelines for AI models?

Setting up a CI/CD pipeline for AI model experimentation

Training the model (train.py)

Evaluating the model (evaluate.py)

Pushing to GitHub

CircleCI configuration

Running training and evaluation for each model

Configuring the pipeline

Multiple pipelines

Handling dependencies and caching

Analysis of results from multiple pipelines

Conclusion

Similar posts you may enjoy

Validating OS-compatibility for locally-run LLMs using Ollama with CI/CD matrix workflows

Understanding Playwright test hooks in the CI context (JavaScript) – A complete tutorial

Kubernetes sidecar deployment using CircleCI

Training the model (`train.py`)

Evaluating the model (`evaluate.py`)