Skip to content

How to Add Continuous Benchmarking to Your Projects Using GitHub Actions

Over the lifetime of a project performance, issues may arise from time to time. Lots of the time, these issues don't get detected until they get into production. Adding continuous benchmarking to your project and build pipeline can help you catch these issues before that happens.

What is Continuous Benchmarking

Benchmarking is the process of measuring the performance of an application. Continuous benchmarking builds on top of this by doing so either on a regular basis, or whenever new code is pushed so that performance regressions can be identified and found as soon as they are introduced.

Adding continuous benchmarking to your build pipeline can help you effectively catch performance issues before they ever make it to production. Much like with tests, you are still responsible for writing benchmark logic. But once that’s done, integrating it with your build pipeline can be done easily using the continuous-benchmark GitHub Action.

github-action-benchmark

github-action-benchmark allows you to easily integrate your existing benchmarks written with your benchmark framework of choice with your build pipeline, with a wide range of configuration options. This action allows you to track the performance of benchmarks against branches in your repository over the history of your project. You can also set thresholds on workflows in PRs, so performance regressions automatically prevent PRs from merging.

Benchmark results can vary from framework to framework. This action supports a few different frameworks out of the box, and if yours is not supported, then it can be extended. For your benchmark results to be consumed, they must be kept in a file named output.txt, and formatted in a way that the action will understand. Each benchmark framework will have a different format. This action supports a few of the most popular ones.

Example Benchmark in Rust

Firstly, we need a benchmark to test with, and we’re going to use Rust. I am not going to detail everything to setup Rust projects in general, but a full example can be found here. In this case, there is just a simple fibonacci number generator.

src/lib.rs
pub fn fib(u: u32) -> u32 {
    if u <= 1 {
        1
    } else {
        fib(u - 2) + fib(u - 1)
    }
}

Then, a benchmark for this function can be written like so:

benches/bench.rs
#![feature(test)]

extern crate test;

use rust_example::fib;
use test::Bencher;

#[bench]
fn bench_fib_10(b: &mut Bencher) {
    b.iter(|| {
        let _ = fib(10);
    });
}

#[bench]
fn bench_fib_20(b: &mut Bencher) {
    b.iter(|| {
        let _ = fib(20);
    });
}

In this case, we have two benchmarks that use the fib function with a different amount of iterations. The more iterations that you execute, the more accurate your results will be.

Finally, if your project is setup to compile with cargo already, running the benchmarks should be as simple as running cargo bench. Now that the benchmark itself is setup, it’s time to move to the action.

GitHub Action Setup

The most basic use-case of this action is setting it up against your main branch so it can collect performance data from every merge moving forward. GitHub actions are configured using yaml files. Let’s go over an example configuration that will run benchmarks on a rust project every time code gets pushed to main, starting with the event trigger.

name: Benchmarks
on:
  push:
    branches:
      - main

If you aren’t familiar with GitHub Actions already, the ‘on’ key allows us to specify the circumstances that this workflow will run. In our case, we want it to trigger when pushes happen against the main branch. If we want to, we can add additional triggers and branches as well. But for this example,, we’re only focusing on push for now.

jobs:
  benchmark:
    name: Run Rust Benchmark
    runs-on: ubuntu-latest
    steps:
      # Checkout the code and run the benchmarks using cargo.
      - uses: actions/checkout@v2
      - run: rustup toolchain update nightly && rustup default nightly
      - name: Run benchmark
        run: cargo bench | tee output.txt

      # Push the results using the benchmark action.
      - name: Store Benchmark Result
        uses: benchmark-action/github-action-benchmark@v1
        with:
          name: Rust Benchmark
          tool: 'cargo'
          output-file-path: output.txt
          github-token: ${{ secrets.GITHUB_TOKEN }}
          auto-push: true
          alert-threshold: '200%'
          fail-on-alert: true

The jobs portion is relatively standard. The code gets checked out from source control, the tooling needed to build the Rust project is installed, the benchmarks are run, and then the results get pushed.

For the results storing step, a GitHub API token is required. This is automatically generated when the workflow runs, and is not something that you need to add yourself. The results are then pushed to a special 'gh-pages' branch where the performance data is stored. This branch does need to exist already for this step to work.

Considerations

There are some performance considerations to be aware of when utilizing GitHub Actions to execute benchmarks. Although the specifications of machines used for different action executions are similar, the runtime performance may vary.

GitHub Actions are executed in virtual machines that are hosted on servers. The workloads of other actions on the same servers can affect the runtime performance of your benchmarks. Usually, this is not an issue at all, and results in minimal deviations.

This is just something to keep in mind if you expect the results of each of your runs to be extremely accurate. Running benchmarks with more iterations does help, but isn’t a magic bullet solution.

Here are the hardware specifications currently being used by GitHub Actions at the time of writing this article. This information comes from the GitHub Actions Documentation.

Hardware specification for Windows and Linux virtual machines:

  • 2-core CPU (x86_64)
  • 7 GB of RAM
  • 14 GB of SSD space

Hardware specification for macOS virtual machines:

  • 3-core CPU (x86_64)
  • 14 GB of RAM
  • 14 GB of SSD space

If you need more consistent performance out of your runners, then you should use self-hosted runners. Setting these up is outside the scope of this article, and is deserving of its own.

Conclusion

Continuous benchmarking can help detect performance issues before they cause issues in production, and with GitHub Actions, it is easier than ever to implement it. If you want to learn more about GitHub Qctions and even implementing your own, check out this JS Marathon video by Chris Trzesniewski.