CircleCI Server v3.x Metrics and Monitoring

Last updated
Tags Server v3.x Server Admin

Metrics such as CPU or memory usage and internal metrics are useful in:

  • Quickly detecting incidents and abnormal behavior.

  • Dynamically scaling compute resources.

  • Retroactively understanding infrastructure-wide issues.

Metrics Collection

Scope

Your CircleCI server installation collects a number of metrics and logs by default, which can be useful in monitoring the health of your system and debugging issues with your installation.

Data is retained for a maximum of 15 days.
Prometheus Server is not limited to scraping metrics from your CircleCI server install. By default, it scrapes metrics from your entire cluster. You may disable Prometheus from within the KOTS Admin Console config if needed.

Prometheus

Prometheus is a leading monitoring and alerting system for Kubernetes. Server 3.x ships with a basic implementation of monitoring common performance metrics.

The Prometheus feature is broken in KOTS versions 1.65.0 - 1.67.0. If you rely on this feature, we recommend that you do not upgrade your KOTS version until this has been resolved.

KOTS Admin - Metrics Graphs

By default, an instance of Prometheus is deployed with your CircleCI server install. Once deployed, you may provide the address for your Prometheus instance to the KOTS Admin Console. KOTS uses this address to generate graph data for the CPU and memory usage of containers in your cluster.

The default Prometheus address is http://prometheus-server

From the KOTS dashboard, select "configure graphs". Then enter http://prometheus-server and KOTS generates resource usage graphs.

Telegraf

Most services running on server report StatsD metrics to the Telegraf pod running in server. The configuration is fully customizable, so you can forward your metrics from Telegraf to any output supported by Telegraf through output plugins. By default, it will provide a metrics endpoint for Prometheus to scrape.

Use Telegraf to forward metrics to Datadog

The following example shows how to configure Telegraf to output metrics to Datadog:

  1. Open the management console dashboard and select Config from the menu bar.

  2. Locate the Custom Telegraf config section under Observability and monitoring. There is an editable text window where you can configure plugins for forwarding Telegraf metrics for your server installation.

  3. To forward to Datadog, add the following code, substituting my-secret-key with your Datadog API key:

    [[outputs.datadog]]
      ## Replace "my-secret-key" with Datadog API key
      apikey = "my-secret-key"

For more options, see the Influxdata docs.



Help make this document better

This guide, as well as the rest of our docs, are open source and available on GitHub. We welcome your contributions.

Need support?

Our support engineers are available to help with service issues, billing, or account related questions, and can help troubleshoot build configurations. Contact our support engineers by opening a ticket.

You can also visit our support site to find support articles, community forums, and training resources.