Automating machine learning security checks using CI/CD

Machine learning (ML) pipelines are increasingly being treated like software; built, tested, deployed, and monitored using automated tooling. But while infrastructure as code and microservices have matured with security best practices, ML systems often lag behind. The truth is, your ML pipeline is part of your software supply chain and it is vulnerable.

There are many points where a seemingly functional ML pipeline can be compromised: poisoned training data, backdoored pre-trained models, leaked API keys, dependency exploits, and so on. ML teams need security automation in place to detect or prevent these issues.

Fortunately, you do not need to rebuild your tooling from scratch. You can embed simple yet powerful security checks directly into your continuous integration and delivery (CI/CD) pipeline using tools that you likely already use, like Python, pip, and CircleCI.

In this tutorial, you will learn how to:

Detect hard coded secrets in your ML codebase using a CI-integrated secret scanner.
Audit your Python dependencies for known vulnerabilities with pip-audit.
Verify model reproducibility and integrity by tracking a SHA256 hash in CI.

Prerequisites

To follow along with this tutorial, you should be comfortable working with Python projects and basic Git operations. You will need:

Install Python 3.10+.
Git installed in your machine.
A GitHub account.
A CircleCI account.

Some understanding of how ML pipelines are structured (training, packaging, deployment) will be helpful, but you do not need deep ML expertise to complete the tutorial.

What does ML security really mean?

Security in ML is not just about protecting endpoints or setting access controls. While those are important, ML introduces an additional set of risks that traditional security practices often overlook. At its core, ML security is about ensuring that your models, data, and infrastructure are not vulnerable to tampering, misuse, or exploitation and that you can trust what your system is learning and predicting.

These are some of the most well-known threats in ML security:

Data poisoning: If an attacker can inject malicious examples into your training data – especially in weakly supervised or web-scraped datasets – they can influence your model’s behavior. For example, an email spam classifier trained on poisoned data might learn to ignore certain spam patterns entirely.
Backdoored pre-trained models: Many ML teams rely on public repositories like Hugging Face or GitHub to speed up development. But pre-trained models can contain subtle backdoors or logic bombs that only activate on specific inputs – and these are extremely difficult to detect without rigorous validation.
Malicious dependencies: The supply chain does not stop with models. Like all Python projects, ML pipelines are vulnerable to malicious dependencies – especially from packages. Attackers have published lookalike libraries (like reqeusts instead of requests) that contain credential stealers or payloads triggered during install.
Leaked secrets: It is easy to forget that notebooks, training logs, or even train.py scripts sometimes contain hardcoded API keys or passwords. These often get committed to version control, especially in personal or internal repos.

Even if no malicious actor is involved, your model can still drift or become unstable over time. If your pipeline does not enforce reproducibility, you may deploy a different model than what was tested – leading to performance issues or security gaps. ML security is about catching all of these risks early. And CI/CD is a great place to start.

CI security checks you can add today

Let’s look at what ML security checks look like in practice. This section walks you through adding lightweight, practical safeguards to your ML pipeline using tools that integrate easily into your CI/CD workflow.

Secret scanning

One of the most common and preventable mistakes in ML workflows is the accidental leak of secrets like API keys or credentials into source code.

In this section, you will set up a simple, realistic ML project that trains a small model, and simulate a secret leak in code. Then you will catch that leak automatically using a secret scanning tool, fully integrated into a CircleCI pipeline.

Before diving into the Python scripts, it’s a good practice to define your project dependencies and set up an isolated environment. This ensures that your packages don’t conflict with others on your system.

First, create a virtual environment:

python -m venv .venv
source .venv/bin/activate

Next, create a local project folder. Inside this folder, set up a simple structure that mimics an end-to-end ML pipeline:

ml-security-demo/
├── data/
│   └── iris.csv
├── model/
│   ├── train.py
├── .circleci/
│   └── config.yml
├── .gitleaks.toml
├── requirements.txt

Inside the requirements.txt file, include these dependencies:

numpy==1.23.5
pandas==1.5.3
scikit-learn==1.1.3

Then install them with:

pip install -r requirements.txt

The train.py script will load a small dataset, train a model, and save it. You will introduce a fake secret here to simulate a real-world scenario where sensitive credentials end up in your codebase.

In model/train.py, add the following code:

import pandas as pd
from sklearn.ensemble import RandomForestClassifier
import pickle
import hashlib
import os

# Simulated accidental secret
API_KEY = "sk-test-1234567890abcdef"

BASE_DIR = os.path.dirname(os.path.abspath(__file__))
DATA_PATH = os.path.join(BASE_DIR, "../data/iris.csv")
MODEL_PATH = os.path.join(BASE_DIR, "model.pkl")

df = pd.read_csv(DATA_PATH)
X = df.drop("target", axis=1)
y = df["target"]

model = RandomForestClassifier(random_state=42)
model.fit(X, y)

with open(MODEL_PATH, "wb") as f:
    pickle.dump(model, f)

This API key is obviously fake, but it mimics what a real secret might look like. It is placed directly in the script to demonstrate how a scanner would catch it during CI execution. Scanners typically use regular expressions and heuristic rules to identify patterns that resemble secrets. These patterns often include specific prefixes (e.g., sk-, AKIA) or characteristic lengths and character sets that match known secret formats.

To keep things simple, use Python to generate the Iris dataset and save it to a CSV file. Inside your data folder, create a temporary file and name it generate.py. Next, add the following lines:

from sklearn.datasets import load_iris
import pandas as pd

df = load_iris(as_frame=True).frame
df["target"] = load_iris().target
df.to_csv("iris.csv", index=False)

Run the script with:

cd data
python generate.py

Add Gitleaks configuration

Gitleaks is a lightweight, extensible tool that scans for secrets using regex-based rules. Create a .gitleaks.toml file in the root of your project:

[[rules]]
description = "Generic API Key"
regex = '''(?i)(apikey|api_key|secret|token)[\s:=]+['"]?[a-z0-9-_]{8,}['"]?'''
tags = ["key", "API", "security"]

This rule is intentionally simple. In a production setting, you would use Gitleaks’ full default ruleset or customize it for your organization.

Set up CircleCI configuration

Next, configure a CircleCI job that checks for secrets using Gitleaks. Create .circleci/config.yml with the following contents:

version: 2.1

jobs:
  secret-scan:
    docker:
      - image: cimg/python:3.10
    steps:
      - checkout
      - run:
          name: Install Gitleaks (v8.18.2)
          command: |
            curl -sSL -o gitleaks.tar.gz https://github.com/gitleaks/gitleaks/releases/download/v8.18.2/gitleaks_8.18.2_linux_x64.tar.gz
            tar -xvzf gitleaks.tar.gz
            chmod +x gitleaks
            sudo mv gitleaks /usr/local/bin/
      - run:
          name: Run Gitleaks secret scan
          command: gitleaks detect --source . --config .gitleaks.toml --no-git

workflows:
  version: 2
  run-security-checks:
    jobs:
      - secret-scan

This job installs Gitleaks, then runs a scan over the project source. The --no-git flag tells Gitleaks to scan the current working directory instead of parsing Git history, which simplifies the demo.

To run this check in CI, push your project to a GitHub repository. Next, go to CircleCI and connect your repository.

Trigger a build (it will run automatically on push to main or master).

If everything is set up correctly, the pipeline will fail and Gitleaks will show a message.

Gitleaks detects a secret leak; the pipeline fails

Hence, by scanning your ML code for secrets as part of CI, you add a crucial early-warning system that can block dangerous mistakes before they spread.

Dependency audit

ML environments are often built on a complex web of dependencies including frameworks, numerical libraries, and data loaders. Because these dependencies are rarely audited manually, they present an attractive target for attackers. A single vulnerable version of a widely used package, like PyYAML or numpy, can expose your entire training or inference pipeline to risks ranging from remote code execution to data leaks.

Dependency security is a problem that traditional package managers like pip do not solve on their own. That is where pip-audit comes in: it scans your installed Python packages and cross-references them with a public vulnerability database (OSV) to identify known issues. In this section, you will integrate pip-audit into your CircleCI pipeline to automatically flag risky packages before they can reach deployment.

To demonstrate a realistic issue, update your requirements.txt with a deliberately vulnerable package:

numpy==1.23.5
pandas==1.5.3
scikit-learn==1.1.3
pyyaml==5.3  # Known vulnerability: CVE-2020-14343

This version of pyyaml is affected by a critical deserialization flaw that allows arbitrary code execution when used with yaml.load() unsafely.

You can verify this manually by running pip-audit locally after installing the dependencies.

Now you will define a new job in your .circleci/config.yml file that installs dependencies and runs pip-audit. Add the following after your secret-scan job:

  dependency-audit:
    docker:
      - image: cimg/python:3.10
    steps:
      - checkout
      - run:
          name: Install pip-audit and dependencies
          command: |
            python -m pip install --upgrade pip
            pip install pip-audit
            pip install -r requirements.txt
      - run:
          name: Run pip-audit
          command: pip-audit

Then include it in the workflow at the bottom of the same file:

workflows:
  version: 2
  run-security-checks:
    jobs:
      - secret-scan
      - dependency-audit

When this job runs, CircleCI will install your project’s dependencies, scan them, and fail the build if any vulnerable packages are detected.

A successful vulnerability detection looks something like this in the CircleCI job logs: Pipeline fails; pip-audit detects a vulnerability in a dependency

This output tells you exactly what package is affected, which vulnerability it maps to (by CVE ID), and which version to upgrade to. You could even configure your pipeline to auto-block merges or deployments if high-severity issues are found.

Model hash check

In many ML pipelines, the final product is not a binary or a web service but is a trained model. That model is often saved to disk as a .pkl, .pt, or .onnx file and passed downstream to deployment, evaluation, or sharing stages. But how can you be sure that the model you trained is the one being deployed? How do you detect if it has changed unexpectedly?

One simple answer is to track a cryptographic hash of your trained model file and verify it in CI. This section walks you through adding model hashing to your project and validating it automatically during your CircleCI pipeline.

You will start by updating your existing training script to save a SHA256 hash alongside the trained model. This provides a reproducible fingerprint that CI can later use for validation.

In model/train.py, add the following code (or update it if it already exists):

# Save SHA256 hash
with open(MODEL_PATH, "rb") as f:
    hash_val = hashlib.sha256(f.read()).hexdigest()

with open(HASH_PATH, "w") as f:
    f.write(hash_val)

print(f"Model SHA256: {hash_val}")

By setting a fixed random_state, the model training becomes deterministic. This is essential for generating consistent hashes across runs.

Create a new file at model/check_hash.py to compare the current model hash against the expected value:

import hashlib
import os

BASE_DIR = os.path.dirname(os.path.abspath(__file__))
MODEL_PATH = os.path.join(BASE_DIR, "model.pkl")
HASH_PATH = os.path.join(BASE_DIR, "model.sha256")

with open(MODEL_PATH, "rb") as f:
    actual_hash = hashlib.sha256(f.read()).hexdigest()

with open(HASH_PATH, "r") as f:
    expected_hash = f.read().strip()

if actual_hash != expected_hash:
    print(f"Hash mismatch! Expected: {expected_hash}, Found: {actual_hash}")
    exit(1)
else:
    print(f"Model hash matches: {actual_hash}")

This script will be used in CI to verify that the model produced in the pipeline matches the original.

Next, extend your .circleci/config.yml file by adding a third job:

 model-hash-check:
    docker:
      - image: cimg/python:3.10
    steps:
      - checkout
      - run:
          name: Install dependencies
          command: |
            python -m pip install --upgrade pip
            pip install -r requirements.txt
      - run:
          name: Retrain model and check hash
          command: |
            python model/train.py
            python model/check_hash.py

And update the workflow to include this job:

workflows:
  version: 2
  run-security-checks:
    jobs:
      - secret-scan
      - dependency-audit
      - model-hash-check

If your model has not changed, CircleCI will log:

The two model hashes match; the pipeline builds successfully

If it has changed (due to code, data, or environment differences), the job will fail, and alert you that something about the model is no longer reproducible. This could be a sign of corruption or tampering.

Conclusion

In this tutorial, you learned how to add essential security checks to your ML pipeline using CircleCI, including secret scanning with Gitleaks, dependency auditing with pip-audit, and model hash validation to ensure integrity and reproducibility. These lightweight steps address risks like leaked credentials, vulnerable packages, and model drift without requiring major changes to your workflow.

Give yourself some credit! You just leveled up your ML pipeline, so go pass that energy on to your team.

You can find the complete sample project here: ML security in CI/CD repo

Site

Blog