10 February 2020

Speeding up C++ Build Pipelines.

by Jani Mikkonen

I started in a new position and company some months ago and “inherited” an environment where we build around tenish projects. Mostly C++ code, CMake based build scripts and compilation with MSVC. First task to tackle was to get the feedback time into more reasonable range. At the time of me joining, full pipeline took somewhere around 45 minutes up to an hour - depending on the buildslave’s specs. I’ve worked with similar stuff in the past so i took the challenge!

At the time, the project wasn’t really optimized at all so it was pretty easy to get started and have a real impact from almost the day 1 but before i’ll start to dig in, lets describe what was actually happening

CMake scripts generated Visual Studio projects
Project unittests written on top of GoogleTests.
Reporting of the unittests via GTest’s --gtest_output flag (JUnit format for Jenkins compatibility)

So, pretty standard stuff right? Lets start hacking!

Compilation

There are few ways to decrease compilation time: Parallel building and compiler caches. Neither of those where in place so lets start from there.

Normal procedure to build was in a nutshell something like this:

call vcvarsall.bat 
mkdir buildroot
pushd buildroot
cmake -G "Visual Studio 15 2017 Win64" ..\ProjectA
cmake --build . --config %BUILD_TYPE%

Parallel building.

So, in order to compile the project in parallel, CMake has variable CMAKE_BUILD_PARALLEL_LEVEL to enable it. Typically you would set it to number of CPU cores/threads you want to dedicate to the compilation process. We started to inject this to the each build slave via environment variable which was specific to each build slave so we wouldn’t bog down the slaves when compilation was happening when build slaves where not identical. So far so good.

I had been playing around with Ninja build tool in the past and at least then, it was still faster than using visual studio project to build things up. Ninja sells its self as:

small build system with a focus on speed. It differs from other build systems in two major respects: it is designed to have its input files generated by a higher-level build system, and it is designed to run builds as fast as possible.

Indeed, its rather small, installation is a single executable that you can drop anywhere in the PATH and then you tell CMake to generate ninja specific build scripts.

cmake -G "Ninja" ..\ProjectA -DCMAKE_BUILD_TYPE=%BUILD_TYPE%
cmake --build . 

At this point, when generating the build scripts, CMake needs to know which build type (Debug, Release, RelWithDebInfo, …) is being used since ninja itself opts out of knowing anything about those. Thus CMake build phase doesn’t need the build type anymore but we need to use that info somehow so that we get the results we want. -DCMAKE_BUILD_TYPE= to the generation phase. Difference here is that ninja files are essentially “single configuration” where as MSVC projects will contain configuration for all target build types.

Taking ninja into use had smaller issues here and there so picking it up wasn’t that straight forward. Few colleges who where already using Ninja, had done already most of legwork.

Compiler cache

Compiler caches work by making a unique identifier for each compilation unit, calls the real compiler and stores the results into its own storage. If that same identifier is referenced later, there’s no need to call the compiler anymore so saving a bit of time. One could consider that as a key value database of sorts.

When setting things up, do check out sort of compiler flags you are generally using. There are certain features that will prevent different implementations of compiler caches to effectively cache everything like usage of pre-compiled headers, how and where you store debug symbols, etc …

I did end up picking up clcache partly due ease of installation (pip!) and previous experiences with it. Its not perfect but works well enough. My issue and how clcache works could be described something along these lines: In order to keep the cache up to date and in good shape, all updates to it must be done properly (and hopefully in order) so updating that data has to have form of support for transactions.

Read: Locks. And on top of the windows file system.

And when i already introduced a parallel building, updating or fetching from cache will introduce quite a lot of access calls. And of course there is a long running bug report from like 2017 about this. There is however ways avoid this (and in general, clcache works just fine) by using clcache-server, longer timeouts and what not so, at least i can live with this. So far i think we have around 1 build failure a week due to CacheLockExceptions ..

Author of the clcache has mentioned that these could actually be fixed by moving to a proper database .. Since clcache is python, and there’s native sqlite3 support in most of the python interpreters, i wonder if something with a bit of free time could check that out. Hint Hint!

As for reference, i did check out few other options to provide compiler caching: Mozilla’s sccache looked promising but on windows file system and how it handles locking on top of parallel building, it just wasn’t option..

Test Execution

In the environment, we had multiple test runners aka executables and CTest was not used. First approach had been to just execute the test binaries via for loop and parse the results and generate a final report. There where few reasons to do this due to GTest and its JUnit format reporting. Introducing CTEst would grant a low hanging fruit for not having to have separate scripting to run our unit test assets but that wouldn’t bring any speed gains. In order get any speedups, we would need to run tests in parallel but what about the actual tests, would they work ? When i introduced CTest to run the tests, i opted to use gtest_discover_tests from a single macro location. This meant that if i had to modify how tests are discovered and executed, I’d only need to make the modification into single location:

enable_testing()
macro(discover_tests_at_compile_time _runner _cwd)
    # Non interesting code removed ...
    gtest_discover_tests(${_runner}
        WORKING_DIRECTORY ${_cwd}
     )
endmacro(discover_tests_at_compile_time)

What the gtest_discover_tests does is that it gets the list from tests from the test runner and generate CTEst configuration file to run each test case in a separate process so there would be at least some separation between them. But as people write code, they might not think about constraints like isolating integration to operating system level resources. For example, if you create a FIFO, pipe or touch file system type resources, you might end up at random failures when 2 different test cases use the same resource at same time. And of course this happened right away after trying to run our test assets in parallel via CTest.

Now, all the tests where these conflicts where happening where typically from the same testrunner (and we had multiple test runners) so maybe it would be possible to run tests from single test runner in serial but parallelize the test runners themselves. This is archived by using RESOURCE_LOCK property.

    gtest_discover_tests(${_runner}
        WORKING_DIRECTORY ${_cwd}
        PROPERTIES RESOURCE_LOCK "${_runner}"
     )

For the most parts, RESOURCE_LOCK handled things well, for the rest, I’ll be filling bug reports. In short, by going this way, I didn’t need huge overhead of test refactoring to get at least some speed gains of parallelizing our test execution.

Test Reporting.

This part isn’t really about speeding up the process but it’s still relevant… Because of GTest.

As mentioned earlier, we used JUnit reports that GTest itself generates (by passing EXTRA_ARGS parameter to gtest_discover_tests) and we already had some tooling to aggregate these results into single report. But oh boy, GTest is a funny beast when it comes to writing it’s own reports. For example: if the test crashes, no report is written by default. I’d assume this could be tackled by writing own main() function but that would mean a lot of code changes here and there. GTest also decides on its own what the file name should be and if it exists, it just adds a counter to the end. When running tests in parallel, this ended up causing file locking and/or overwritten results from other tests. Sometimes I’ve also seen an issue with GTest generated JUnit files where elapsed time for a test case was negative value which caused the whole build to fail. This could have happened due to ill timed NTP update but i have not really looked into this.

Reason for relying on JUnit was mainly due to our Jenkins jobs which where configured to use Junit plugin to parse the results. However, xUnit plugin has “native” support for CTest so why not to use it ?

So, small change to pipeline code:

:: from
set CTEST_EXTRAS=--output-on-failure 
set CTEST_PARALLEL=-j 1
:: to
set CTEST_EXTRAS=--no-compress-output --output-on-failure -T test
set CTEST_PARALLEL=-j 5
:: and actual execution 
ctest %CTEST_EXTRAS% %CTEST_PARALLEL% -c %BUILD_CONFIG%

And change to the pipeline settings to use xUnit publisher with CTest reports and we where set!

Final thoughts.

With all of the above changes, our build pipeline for the parts where these where taken into use have seen speedups of around 50% to 60%.

At the time of writing this, we are also introducing conan to the build infra to avoid compiling “vendor libraries” (and/or storing binary blobs in some “random” CIFS server or git repo) over and over again. I’d assume this will also provide some speed ups once the infra is in place.

tags: c++ - googletest - gtest - ci - automation

rasjani @ github.io / Speeding up C++ Build Pipelines.