Creating and testing a RAG-powered AI app with Gemini and CircleCI

Have you ever asked an AI model a question and received an outdated or completely off-base response? I’ve been there too. The problem is that most AI models rely solely on their pre-trained knowledge, which becomes obsolete over time. This is where RAG can help: RAG is a hybrid AI technique that combines the advantages of retrieval systems and generative models. It bridges the gap by bringing in real-time information from external knowledge sources to improve the generation quality. This results in more precise, contextually aware, and up-to-date outputs.

RAG is important in providing improved accuracy by pulling through validated external data, lowering hallucinations while increasing content precision. Some of the benefits of RAG:

Increased creativity: It can produce more comprehensive and unique forms of responses by accessing diverse datasets.
More control over data: By ensuring the retrieval of data sources, organizations can make sure that the outputs are in accordance with their domain expertise and regulatory requirements.

Along with the benefits, there are some drawbacks as well. Building a RAG system that’s ready for production use isn’t easy. Some of the major challenges around building an RAG app include:

Challenges of RAG

Data pipeline complexity: Building a manageable and reliable data retrieval pipeline.
Latency issues: Ensure retrieval and generation are low-latency to meet performance goals.
Version control: Managing changes to retrieval sources and the generative model without breaking the system.
Scalable: The ability to handle and serve increasing consumer demand with enough performance and dependability.
Monitoring and debugging: Tracking and addressing issues in the retrieval and generation workflow.

With all these challenges, I started looking for a way to streamline the process. Here’s deep dive into finding a solution for it.

Prerequisites

Basic knowledge of the Python programming language
Python (version >= 3) installed on your system and updated
A CircleCI account
A GitHub account
A Gemini API Key

Note: If you don’t have one, generate it by following these steps:

Visit Google AI Studio
Select the Gemini Model and click Get API key
Click Create API Key
Copy and save the key

Cloning the starter repository

To follow along with this tutorial, clone the starter branch of this repository using the following command:

git clone -b starter https://github.com/CIRCLECI-GWP/ci-cd-rag-system.git

Once cloned, go into the project root (ci-cd-rag-system) and create a .env file with the following content:

GEMINI_API_KEY=YOUR_API_KEY

and then run the following command to create a virtual environment, activate the environment and run the project with the local Python server.

Once the virtual environment is created, you can install the dependencies using this command:

#create a venv
python3 -m venv venv

# activate venv
source venv/bin/activate

# install dependencies
pip install -r requirements.txt

And then run the project using the following command:

# run the app
python main.py

You should see the following output:

RAG Response: The main topic of the document is evaluating generative AI.  The document provides a framework for evaluating generative AI, discusses different evaluation approaches and their importance, and offers practical examples.

Additionally, go ahead and run the tests using the following command:

python main_test.py

You should see the following output:

----------------------------------------------------------------------
Ran 2 tests in 0.539s

OK

Note: If the app fails to run or the tests fail, please ensure that your API key is correct and that the dependencies are installed correctly.

I will break down the code to help you understand the implementation later in the tutorial.

Automating code integration and deployment with CircleCI

CircleCI simplifies this process by automating code testing, integration, and deployment, ensuring that every change is validated before reaching production. With CircleCI, I can streamline my workflow, reduce manual effort, and catch issues early, leading to faster and more reliable releases. Whether I’m working on a small project or a large-scale system, implementing CI/CD with CircleCI helps me maintain code quality and ship updates seamlessly.

For those new to CircleCI, follow this guide to create a project and authorize CircleCI to access your GitHub, Bitbucket, or GitLab account. This will enable automatic detection and setup of a CI/CD pipeline. This tutorial uses GitHub as the version control system.

Why CI/CD matters for AI and RAG Systems

CI/CD simplifies the complicated process of software testing and release. Continuous Integration enables developers to test and merge code changes on a regular basis, ensuring that everything runs smoothly. Continuous Deployment builds on this by automating the release of updates, speeding up and improving reliability. CI/CD allows teams to handle typical challenges like:

Faster updates to retrieval databases and generative models.
Seamless integration of new features or bug fixes.
Consistent performance through automated validation at every stage.

CI/CD integration into RAG workflows not only decreases system maintenance but also increases innovation, making it an essential strategy for firms wishing to use RAG more efficiently.

Before integrating CI/CD into our application, let’s first understand the cloned RAG system and later explore how CI/CD fits into it.

Open the main.py file. It will contain this code:

import os
from dotenv import load_dotenv
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain_google_genai import GoogleGenerativeAIEmbeddings, GoogleGenerativeAI
from langchain.chains import RetrievalQA
from langchain_core.prompts import PromptTemplate

# Load environment variables from the .env file
load_dotenv()

# Retrieve API Key once at the top
API_KEY = os.getenv("GEMINI_API_KEY")
if not API_KEY:
    raise ValueError("GEMINI_API_KEY is not set. Please add it to your .env file.")

# Function to Load PDF and Extract Text
def load_pdf(pdf_path):
    loader = PyPDFLoader(pdf_path)
    return loader.load()

# Function to Split Text into Chunks
def split_text(documents, chunk_size=1000, chunk_overlap=100):
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size, chunk_overlap=chunk_overlap
    )
    return text_splitter.split_documents(documents)

# Function to Create FAISS Vector Store
def create_faiss_vector_store(chunks):
    embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001", google_api_key=API_KEY)
    vector_store = FAISS.from_documents(chunks, embeddings)
    return vector_store

# Function to Create and Run RAG Pipeline
def rag_pipeline(pdf_path, query):
    # Load and Split PDF
    documents = load_pdf(pdf_path)
    chunks = split_text(documents)

    # Create FAISS Vector Store
    vector_store = create_faiss_vector_store(chunks)

    # Define Retriever
    retriever = vector_store.as_retriever()

    # Define LLM (Google Gemini)
    llm = GoogleGenerativeAI(model="gemini-1.5-flash", google_api_key=API_KEY)

    # Define Custom Prompt
    prompt_template = PromptTemplate(
        input_variables=["context", "question"],
        template="Use the following context to answer the question:\n\n{context}\n\nQuestion: {question}\nAnswer:"
    )

    # Create RAG Chain
    rag_chain = RetrievalQA.from_chain_type(
        llm=llm,
        retriever=retriever,
        return_source_documents=True
    )

    # Get Response
    response = rag_chain.invoke({"query": query})
    return response["result"]

# Example Usage
if __name__ == "__main__":
    pdf_file = "https://services.google.com/fh/files/misc/evaluation_framework.pdf"
    question = "What is the main topic of the document?"
    response = rag_pipeline(pdf_file, question)
    print("RAG Response:", response)

RAG response: The main topic of the document is evaluating generative AI. The document provides a framework for evaluating generative AI, discusses different evaluation approaches and their importance, and offers practical examples.

Code walkthrough: understanding the implementation

Below is a detailed walkthrough of the code to help you understand how the RAG system works.

Extracting and preparing data

The first step in my pipeline was to load my knowledge source. Since I was working with PDFs, I used PyPDFLoader to extract text from a document. This function loads the document and makes its text available for processing.

def load_pdf(pdf_path):
    loader = PyPDFLoader("https://services.google.com/fh/files/misc/evaluation_framework.pdf")
    return loader.load()

Splitting text for efficient retrieval

Once the text was extracted, I needed to break it into smaller, meaningful chunks. This ensures that when a user queries the system, it retrieves only the most relevant sections rather than the entire document.

def split_text(documents, chunk_size=1000, chunk_overlap=100):
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)
    return text_splitter.split_documents(documents)

Indexing with a vector store

To make my extracted text searchable, I converted each text chunk into embeddings (numerical representations of text) and stored them in a FAISS vector database.

def create_faiss_vector_store(chunks):
    embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001", google_api_key=API_KEY)
    vector_store = FAISS.from_documents(chunks, embeddings)
    return vector_store

FAISS allowed me to perform quick and efficient similarity searches, ensuring that when a query came in, the retriever could find the best-matching information.

Retrieving relevant information

With the vector store set up, I created a retriever to fetch the most relevant chunks of text.

retriever = vector_store.as_retriever()

Now, whenever I pass a query, the retriever finds the most contextually relevant sections.

Generating answers with LLM

Once I had retrieved relevant information, I needed a large language model (LLM) to generate a response. I integrated Google Gemini, which allowed me to process the retrieved text and generate an answer.

llm = GoogleGenerativeAI(model="gemini-1.5-flash", google_api_key=API_KEY)

To ensure the model used the retrieved context effectively, I structured my prompt template as follows:

prompt_template = PromptTemplate(
    input_variables=["context", "question"],
    template="Use the following context to answer the question:\n\n{context}\n\nQuestion: {question}\nAnswer:"
)

With this setup, my RAG system could dynamically generate responses based on the retrieved context.

Running the RAG pipeline

Bringing everything together, I created a function to handle the entire RAG workflow.

def rag_pipeline(pdf_path, query):
    documents = load_pdf(pdf_path)
    chunks = split_text(documents)
    vector_store = create_faiss_vector_store(chunks)
    retriever = vector_store.as_retriever()
    llm = GoogleGenerativeAI(model="gemini-1.5-flash", google_api_key=API_KEY)
    prompt_template = PromptTemplate(
        input_variables=["context", "question"],
        template="Use the following context to answer the question:\n\n{context}\n\nQuestion: {question}\nAnswer:"
    )
    rag_chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever, return_source_documents=True)
    response = rag_chain({"query": query})
    return response["result"]

Now, whenever I need an answer, I can simply call this function with a query. It will fetch the relevant information, process it, and generate a response.

Writing tests

As you build your RAG system, you need to ensure that it works as expected. The test_main.py file has some tests already written for you.

import unittest
from unittest.mock import patch
from langchain.schema import Document
from langchain_core.retrievers import BaseRetriever
import main

class MockRetriever(BaseRetriever):
    """Mock Retriever that simulates a real retriever's behavior."""
    def _get_relevant_documents(self, query):
        return [Document(page_content="Evaluation frameworks are crucial for AI models.")]

    async def _aget_relevant_documents(self, query):
        return self._get_relevant_documents(query)

class TestRAGPipeline(unittest.TestCase):

    @patch("main.load_pdf")
    @patch("main.create_faiss_vector_store")
    def test_rag_pipeline(self, mock_create_faiss, mock_load_pdf):
        """Test the RAG pipeline end-to-end with mocks."""

        # Mock the document loader
        mock_load_pdf.return_value = [Document(page_content="Evaluation frameworks are crucial for AI models.")]

        # Mock FAISS Vector Store and Retriever
        mock_retriever = MockRetriever()
        mock_create_faiss.return_value.as_retriever.return_value = mock_retriever

        # Run the pipeline with a mock query
        response = main.rag_pipeline("https://services.google.com/fh/files/misc/evaluation_framework.pdf", "What is the main topic?")

        # Assertions
        self.assertIsNotNone(response)
        self.assertIn("Evaluation frameworks", response)

    def test_split_text(self):
        """Test if text splitter correctly splits text into chunks."""
        documents = [Document(page_content="This is a sample document with multiple lines of text.")]
        chunks = main.split_text(documents, chunk_size=10, chunk_overlap=2)

        self.assertIsInstance(chunks, list)
        self.assertGreater(len(chunks), 0)
        self.assertTrue(all(isinstance(chunk, Document) for chunk in chunks))

if __name__ == "__main__":
    unittest.main()

This test file creates a TestRAGPipeline class that inherits from unittest.TestCase. It contains test methods to validate different components of the RAG system. The test_rag_pipeline method mocks the document loader and FAISS vector store to ensure the pipeline retrieves relevant information correctly. The test_split_text method verifies that the text splitter correctly breaks documents into chunks. Mocking is used to isolate dependencies, making tests reliable and independent of external services.

Setting up CI with CircleCI

In this section, you will set up Continuous Integration (CI) with CircleCI to automate the testing of your RAG system. CircleCI will run the tests every time you push code to our repository, ensuring that our system remains reliable and bug-free.

CircleCI will need a configuration file so that it knows what you want it to do. Create a folder named .circleci. Inside the folder, create a file named config.yml.

version: 2.1

jobs:
  build:
    working_directory: ~/circleci-python
    docker:
      - image: cimg/python:3.13.2
    steps:
      - checkout
      - run: pip install -r requirements.txt
      - run: python3 main.py
  test:
    working_directory: ~/circleci-python
    docker:
      - image: cimg/python:3.13.2
    steps:
      - checkout
      - run: pip install -r requirements.txt
      - run: python3 test_main.py
  deploy:
    docker:
      - image: cimg/python:3.13.2
    steps:
      - run: echo "Deploying to production"

workflows:
  build_and_test:
    jobs:
      - build
      - test:
          requires:
            - build
      - deploy:
          requires:
            - test
          filters:
            branches:
              only: main

There are three types of tasks (jobs) that you want CircleCI to do: Run main.py, run the tests, and then deploy it. For every branch, you want to perform the build job, the test job, and the deploy job.

Save this file and push it to your repository.

git add .
git commit -m "Add CircleCI config file"
git push

Next, log in to CircleCI with your GitHub account and search for the project you just pushed.

CircleCI pipeline dashboard

You will be prompted to enter the name of the branch housing the configuration file. In this case, that would be the main branch. Click Set Up Project to start the build process.

This will trigger the pipeline, but the build will fail because you haven’t added the CircleCI environment variable for the Gemini API key.

CircleCI build failure

To fix this, go to the project settings on CircleCI and add the GEMINI_API_KEY environment variable with the value of your API key.

CircleCI environment variables

Re-run the build, and it should pass successfully.

CircleCI build success

Congratulations! You now have a working CI/CD pipeline to build, test, and deploy your RAG apps. You can get even more out of your pipeline by:

Automating scheduled jobs to keep the retrieval database updated.
Configuring different workflows for different branches, ensuring that only stable branches (like main or master) trigger deployments.
Setting up CD to a staging environment for testing before deploying updates to production.
Using Docker environments instead of native images for better portability and consistency.

Conclusion

In this article, I took you through the process of setting up Continuous Integration (CI) using CircleCI, demonstrating how it can streamline development workflows. While this project was beginner-friendly, having a solid base configuration makes it easy to extend to more complex scenarios. And of course, setting up the CD part is as simple as choosing a job to deploy and adding that to your workflow.

By integrating CI/CD, RAG systems become more resilient, scalable, and efficient. Automation minimizes manual intervention, allowing for continuous improvements and ensuring optimal performance. Moreover, CI/CD fosters a more strong AI ecosystem by maintaining model integrity, tracking changes, and allowing teams to experiment and refine their workflows without compromising stability.

As the demands on AI systems continue to grow, embracing CI/CD ensures that RAG applications remain adaptable, high-performing, and production-ready, capable of evolving with technological advancements and real-world challenges.

Site

Blog