Deploying documentation to GitHub Pages with continuous integration
Continuous integration (CI) tools have evolved from running tests and reporting results, becoming flexible, general-purpose computing environments. You can use CI to run full builds and send artifacts to external systems. If you’re already using a CI system, you might want to consider building and deploying your documentation using the same platform. It can be much more convenient than implementing another tool or service.
This tutorial provides an overview of some options available for building and deploying documentation. You will then explore the details of using CircleCI to deploy documentation to GitHub Pages. This workflow has proven to be convenient for teams already using those tools for hosting code and running automated tests.
Options for deploying documentation
API documentation is usually rendered from a codebase using a language-specific documentation tool (sphinx
for Python or javadoc
for Java). The building of the documentation can be done on the developer’s local machine, in a CI environment, or in a documentation-specific hosting service.
Services for hosting documentation are usually language-specific and can be a great low-friction option for a team that writes projects in a single language. For example, Read the Docs has been the standard for the Python community. Read the Docs uses webhooks to watch commits to a hosted repository and automatically builds and renders documentation for each code update. It offers some benefits that could be difficult to replicate in your own pipeline, such as deploying multiple versions of documentation and maintaining links from rendered docs to source code.
However, Read the Docs’ limitations can pop up if your team needs to deploy docs for additional languages or if builds require uncommon system dependencies that can’t be installed via the pip
or conda
package managers. Using a documentation-specific service also means maintaining another set of user accounts and permissions.
The least infrastructure-dependent workflow for building documentation is for developers to build docs locally and check the results into the project repository. Most teams prefer to keep generated content out of source control to keep code reviews simpler and to lessen developer responsibility for building and committing the content, but some may enjoy seeing the revision history of documentation alongside the code. GitHub has developed support for this workflow by offering the option to render contents of a docs
directory to GitHub Pages. Other setups may still need a separate deploy step for documentation in a CI system.
Instead, if a team decides to build documentation as part of a CI flow, content could be deployed to a variety of destinations; a locally maintained server, an object store like Amazon S3, GitHub Pages, or some other external hosting service. In most cases, the CI job will need some form of credentials to authenticate with the destination, which can be the most complex part of the flow. One of the advantages of GitHub Pages as a documentation host is the consolidation of permissions; any developer with admin access on a repository can set up deploys to GitHub Pages and provision the deploy keys needed for a CI service to commit content.
Options for deploying to GitHub Pages
GitHub offers three options for deploying a site to GitHub Pages, with different implications for workflows and credentials.
The oldest option, which you will use in this tutorial, is for pushes to a special gh-pages
branch to trigger deploys. This can be maintained as an “orphan” branch with a completely separate revision history from main
. This can be a bit difficult to maintain. In this case, you’ll build a CircleCI workflow that builds documentation, commits changes to the gh-pages
branch using a library, and then pushes the branch to GitHub using a deploy key that you will provision.
The second option is to have GitHub Pages render the main
branch. This can be useful for a repository that exists only to host documentation. It doesn’t help much if your goal is to benefit from keeping code and rendered documentation close together with a single permissions model.
Finally, GitHub Pages can render a docs
directory on the main
branch, which supports workflows where developers are expected to generate and commit documentation as part of their local workflows. This requires no CI platform and no additional credentials, but most teams prefer not to include generated content in their main
branch.
Creating a basic Python project
You will start by building a small Python package that uses standard Python ecosystem tools for tests (pytest
) and documentation (sphinx
). You’ll then configure CircleCI to run tests, build documentation, and finally deploy to GitHub Pages via a gh-pages
branch. Full code for the project is available in CIRCLECI-GWP/docs-on-gh-pages.
In a fresh directory, create a simple package called mylib
with a single hello
function. mylib/__init__.py
looks like:
def hello():
return 'Hello'
You also need to create a test
directory with an empty __init__.py
file and test_hello.py
containing:
import mylib
def test_hello():
assert mylib.hello() == 'Hello'
To run the tests, you’ll need pytest
, so specify that in a requirements.txt
file. You’ll also need to request sphinx
, the documentation tool you’ll be using in the next section:
sphinx
pytest
Before installing the dependencies, you need to create a virtual environment. Use the following commands to create a virtual environment and activate it:
python3 -m venv venv
source venv/bin/activate # On Windows, use `venv\Scripts\activate`
Install the dependencies:
pip install -r requirements.txt
At this point, you can write a very simple CircleCI workflow containing a single job that will run your test. Create a .circleci/config
:
version: 2.1
executors:
python-executor:
docker:
- image: python:3.7
working_directory: ~/project
jobs:
test:
executor: python-executor
steps:
- checkout
- run:
name: Install dependencies
command: pip install -r requirements.txt
- run:
name: Run tests
command: pytest
workflows:
version: 2
build-and-deploy:
jobs:
- test
This code has a python-executor
that uses the python:3.7
Docker image and sets the working directory to ~/project
. The test
job uses this executor and runs the steps to install dependencies and run tests. The workflows
section specifies that the test
job should be run.
Next, commit the changes and push your project to GitHub. Log into CircleCI and search for your project.
Click the Set Up Project button. CircleCI should generate an initial build for the main
branch which should come back green.
Add docs
Now that you have a basic library with tests, you can set up the documentation framework. Since you have already installed sphinx
in the previous step, you can now run sphinx-quickstart
to generate a skeleton documentation project. Run:
sphinx-quickstart docs/ --project 'mylib' --author 'J. Doe'
# accept defaults at all the interactive prompts
The flags you used are:
--project 'mylib'
: This sets the project name in the documentation, which will appear in various places, including the generated HTML.--author 'J. Doe'
: This sets the author’s name in the generated documentation.
sphinx-quickstart
generates a Makefile
, so building docs is as simple as calling make html
from the docs/
directory. Codify that in a new job in your CircleCI flow. Add this under jobs:
docs-build:
executor: python-executor
steps:
- checkout
- run:
name: Install dependencies
command: pip install -r requirements.txt
- run:
name: Build documentation
command: |
cd docs
make html
- persist_to_workspace:
root: docs/_build
paths: html
Invoking make html
populates a docs/_build/html
directory containing the content that you want to deploy. The final persist_to_workspace
step of your new docs-build
job saves the contents of that directory to an intermediate location that will be accessible to later jobs in your workflow. For now, we’ll add this new job:
workflows:
version: 2
build:
jobs:
- test
- docs-build
and commit the results.
Even without deploying the rendered content, this job is now serving as a check on the integrity of your docs. If sphinx
is unable to run successfully, this job will fail, letting you know something is wrong.
Deploying rendered docs to a gh-pages branch
We’re ready at this point to start building the final piece of your CI workflow, a job that will deploy the built documentation by pushing it to the gh-pages
branch of your repository.
You want gh-pages
to be an “orphan” branch that tracks only the rendered docs and has a separate timeline from the source code in master
. It’s possible to create such a branch and copy content into it using bare git
command-line invocations, but it can be full of edge cases and easily lead to a corrupted work environment if anything goes wrong. Pulling in a purpose-built tool is a reasonable choice in this case and there several available as open source projects. The most popular among these at the moment is actually a Node.js module called gh-pages
that includes a command-line interface, which is what we’ll use here.
You would be completely justified in questioning why we’d choose an application requiring a JavaScript environment for deploying Python docs. It seems like added complexity at first glance, but it actually fits in your workflow fairly seamlessly since your CI environment supports Docker containers natively and you can choose independent base images for each of your jobs. You get to build the documentation inside a container with a Python runtime, then share the output with a new container with a Node.js runtime.
Now, write a first version of a docs-deploy
job underneath the jobs
section of your config.yml
file and walk through the steps:
docs-deploy:
docker:
- image: cimg/node:16.17
steps:
- checkout
- attach_workspace:
at: docs/_build
- run:
name: Disable Jekyll for GitHub Pages
command: touch docs/_build/html/.nojekyll
- run:
name: Install gh-pages tool
command: npm install gh-pages@3.2.3 --silent
- run:
name: Configure Git for deployment
command: |
git config --global user.email "ci-build@example.com"
git config --global user.name "CI Build"
- run:
name: Deploy documentation to GitHub Pages
command: npx gh-pages --dotfiles --message "[skip ci] Updates" --dist docs/_build/html
You are using a node
base image so that the npm
package manager and Node.js runtime are available. The attach_workspace
step mounts the rendered documentation from the docs-build
step into your container, then you call npm install
to download the target module, which includes a command-line utility, gh-pages
, that we’ll invoke in the next step. The git config
commands are required per the module documentation. Finally, the invocation of gh-pages --dist docs/_build/html
copies the contents of the html
directory into the root of the gh-pages
branch and pushes the results to GitHub.
Let’s add this new step to your workflow. The workflows
section now looks like:
workflows:
version: 2
build-and-deploy:
jobs:
- test
- docs-build
- docs-deploy:
requires:
- test
- docs-build
filters:
branches:
only: main
You made the docs-deploy
job dependent on the other two steps, meaning that it won’t run until both those steps complete successfully. This ensures you don’t accidentally publish docs for a state of the repository that doesn’t pass tests. You also set a filter to specify that the docs-deploy
job should be skipped except for builds of the main
branch. That way, you don’t overwrite the published docs for changes that are still in flight on other branches.
If you check in all these changes and let CircleCI run your job, your new job will fail:
ERROR: The key you are authenticating with has been marked as read only.
So there’s a bit more work you need to do to clean this up and make sure your CI job has the necessary credentials.
Provisioning a deploy key
As mentioned, GitHub provides a few options for giving a job access to change a repository. Generally, GitHub permissions are tied to users, so a credential must either be tied to a single human user account or a special machine user account must be provisioned. There’s a lot of flexibility there for granting access across repositories, but it can become somewhat complex.
Opt instead to provision a read/write deploy key. This is an ssh key pair specific to a single repository rather than a user. This is nice for teams, because it means access doesn’t disappear if the user who provisions the key leaves the organization or deletes their account. It also means that any user who is an administrator on the account can follow the steps below to get the integration set up.
Follow the instructions in the CircleCI docs and apply them to this project.
Start by creating an ssh key pair on your local machine:
ssh-keygen -t rsa -b 4096 -C "ci-build@example.com"
# Accept the default of no password for the key (This is a special case!)
# Choose a destination such as 'docs_deploy_key_rsa'
You end up with a private key docs_deploy_key_rsa
and a public key docs_deploy_key_rsa.pub
. Give the private key to CircleCI by going to https://circleci.com/gh/CIRCLECI-GWP/docs-on-gh-pages/edit#ssh. Click Add SSH Key, enter “github.com” as the hostname, and paste in the contents of the private key file. At this point, you candelete the private key from your system, as only your CircleCI project should have access. Run:
rm docs_deploy_key_rsa
The https://app.circleci.com/settings/project/github/CIRCLECI-GWP/docs-on-gh-pages/ssh page shows the fingerprint for your key, which is a unique identifier that’s safe to expose publicly (unlike the private key itself, which could give an attacker write access to your repository). Add a step in your docs-deploy
job to grant the job access to the key with this fingerprint:
- add_ssh_keys:
fingerprints:
- "59:ad:fd:64:71:eb:81:01:6a:d7:1a:c9:0c:19:39:af"
Note: Your fingerprint will be different from the one shown here. Make sure to use the one shown in your CircleCI settings.
Next, go to https://app.circleci.com/settings/project/github/CIRCLECI-GWP/docs-on-gh-pages/advanced and check that “Pass secrets to builds from forked pull requests” is set to its default of “Off”.
SSH keys are one of the types of secrets that you should make available only if you trust the code being run. If you allowed this key to be available to forks, an attacker could craft a pull request that prints the contents of your private key to the CircleCI logs.
Upload the public key to GitHub so that it knows to trust a connection from CircleCI initiated with your private key. Go to https://github.com/CIRCLECI-GWP/docs-on-gh-pages Settings > Deploy keys. Enter “CircleCI write key” and paste in the contents of docs_deploy_key_rsa.pub
. If you haven’t already deleted the private key, be extra careful you’re not accidentally copying from docs_deploy_key_rsa
!
Some final fixups
Before you test that your CircleCI workflow can successfully push changes to GitHub, you need to address a few final details.
First, your built documentation contains directories starting with _
, which have special meaning to jekyll
, the static site engine built into GitHub Pages. You don’t want jekyll to alter your content, so you need to add a .nojekyll
file and pass the --dotfiles
flag to gh-pages
since that utility will otherwise ignore all dotfiles.
Second, you need to provide a custom commit message that includes [skip ci]
which instructs CircleCI that it shouldn’t initiate a new when you push this content to the gh-pages
branch. The gh-pages
branch contains only rendered HTML content, not the source code and config.yml
, so the build will have nothing to do and will simply show up as failing in CircleCI. The full docs-deploy
job is:
docs-deploy:
docker:
- image: cimg/node:16.17
steps:
- checkout
- attach_workspace:
at: docs/_build
- run:
name: Disable Jekyll for GitHub Pages
command: touch docs/_build/html/.nojekyll
- run:
name: Install gh-pages tool
command: npm install gh-pages@3.2.3 --silent
- add_ssh_keys:
fingerprints:
- "SHA256:BPNOXVtVDIPb0Bll3FkCj12Vv9K7HSsup5jNx/XJrMs"
- run:
name: Configure Git for deployment
command: |
git config --global user.email "ci-build@example.com"
git config --global user.name "CI Build"
- run:
name: Deploy documentation to GitHub Pages
command: npx gh-pages --dotfiles --message "[skip ci] Updates" --dist docs/_build/html
You’re ready to commit your updated configuration and let CircleCI run the workflow.
Once it shows green, you should notice that your repository now has a gh-pages
branch and that the rendered content is now available at https://circleci-gwp.github.io/docs-on-gh-pages/.
Conclusion
There is no one obvious “best way” to build and deploy documentation. The path of least resistance for your team is going to depend on the particular mix of workflows, tools, and infrastructure that you are already familiar with. Your organizational structure is important as well, as it will have implications for who needs to be involved to provision credentials and get systems talking to one another.
The particular solution presented here is currently a good fit for the data platform team at Mozilla (see an example in practice at mozilla/python_moztelemetry) because it is adaptable to different languages (our team also maintains projects in Java and Scala), it minimizes the number of tools to be familiar with (we are already invested in GitHub and CircleCI), the permissions model gives your team autonomy in setting up and controlling the documentation workflow, and you haven’t seen a need for any of the more advanced features available from documentation-specific hosting providers.