The official CircleCI Evals Orb makes it easy to integrate LLM evaluations into a CI pipeline, and to review evaluation results without context switching. The output of evaluations run through the Evals Orb is stored in CircleCI, and is accessible as a job artifact and as a PR comment added automatically by CircleCI.
Currently, the Evals Orb exposes commands to run evaluations through two popular LLMOps tools: LangSmith and Braintrust. If your evals leverage a different tool, let us know at ai-feedback@circleci.com. You can also contribute directly to the official Orb, by opening a PR on the public repository.
More resources on evaluating LLM-enabled applications are available in our documentation.