#cache #compilation #cloud-storage #build-system #sccache #remote #distributed

build cachepot

cachepot is a sccache-like tool. It is used as a compiler wrapper and avoids compilation when possible, storing a cache in a remote storage using cloud storage.

2 unstable releases

0.1.0-rc.1 Mar 10, 2022
0.0.0 Dec 10, 2020

#98 in Build Utils

Apache-2.0

1MB
24K SLoC

Build Status rust 1.56.1+ badge

cachepot maskot image

cachepot - Shared Compilation Cache

cachepot is a ccache-like compiler caching tool. It is used as a compiler wrapper and avoids compilation when possible, storing cached results either on local disk or in one of several cloud storage backends.

It's also a fork of sccache with improved security properties and improvements all-around the code base. We upstream as much as we can back upstream, but the goals might not be a 100% match.

cachepot includes support for caching the compilation of C/C++ code, Rust, as well as NVIDIA's CUDA using nvcc.

cachepot also provides icecream-style distributed compilation (automatic packaging of local toolchains) for all supported compilers (including Rust). The distributed compilation system includes several security features that icecream lacks such as authentication, transport layer encryption, and sandboxed compiler execution on build servers. See the distributed quickstart guide for more information.


Table of Contents (ToC)


Installation

There are prebuilt x86-64 binaries available for Windows, Linux (a portable binary compiled against musl), and macOS on the releases page.

If you have a Rust toolchain installed you can install cachepot using cargo. Note that this will compile cachepot from source which is fairly resource-intensive. For CI purposes you should use prebuilt binary packages.

cargo install --git https://github.com/paritytech/cachepot

Usage

Running cachepot is like running ccache: prefix your compilation commands with it, like so:

cachepot gcc -o foo.o -c foo.c

If you want to use cachepot for caching Rust builds you can define build.rustc-wrapper in the cargo configuration file. For example, you can set it globally in $HOME/.cargo/config by adding:

[build]
rustc-wrapper = "/path/to/cachepot"

Note that you need to use cargo 1.40 or newer for this to work.

Alternatively you can use the environment variable RUSTC_WRAPPER:

RUSTC_WRAPPER=/path/to/cachepot cargo build

cachepot supports gcc, clang, MSVC, rustc, NVCC, and Wind River's diab compiler.

If you don't specify otherwise, cachepot will use a local disk cache.

cachepot works using a client-server model, where the server (which we refer to as "coordinator") runs locally on the same machine as the client. The client-server model allows the server/coordinator to be more efficient by keeping some state in memory. The cachepot command will spawn a coordinator process if one is not already running, or you can run cachepot --start-coordinator to start the background server process without performing any compilation.

You can run cachepot --stop-coordinator to terminate the coordinator. It will also terminate after (by default) 10 minutes of inactivity.

Running cachepot --show-stats will print a summary of cache statistics.

Some notes about using cachepot with Jenkins exist.

To use cachepot with cmake, provide the following command line arguments to cmake >= 3.4:

-DCMAKE_C_COMPILER_LAUNCHER=cachepot
-DCMAKE_CXX_COMPILER_LAUNCHER=cachepot

Build Requirements

cachepot is a Rust program. Building it requires cargo (and thus rustc). cachepot currently requires Rust 1.56.1. We recommend you install Rust via Rustup.

Build

If you are building cachepot for non-development purposes make sure you use cargo build --release to get optimized binaries:

cargo build --release [--no-default-features --features=s3|redis|gcs|memcached|azure]

By default, cachepot builds with support for all storage backends, but individual backends may be disabled by resetting the list of features and enabling all the other backends. Refer the Cargo Documentation for details on how to select features with Cargo.

Linux

No native dependencies.

Build with cargo and use ldd to check that the resulting binary does not depend on OpenSSL anymore.

Linux and Podman

Also you can build the repo with Parity CI Docker image:

podman run --rm -it -w /shellhere/cachepot \
                    -v "$(pwd)":/shellhere/cachepot:Z \
                    -u $(id -u):$(id -g) \
                    --userns=keep-id \
                    docker.io/paritytech/cachepot-ci:staging cargo build --locked --release
#artifacts can be found in ./target/release

If you want to reproduce other steps of CI process you can use the following guide.

macOS

No native dependencies.

Build with cargo and use otool -L to check that the resulting binary does not depend on OpenSSL anymore.

Windows

On Windows, the binary might also depend on a few MSVC CRT DLLs that are not available on older Windows versions.

It is possible to statically link against the CRT using a .cargo/config file with the following contents.

[target.x86_64-pc-windows-msvc]
rustflags = ["-Ctarget-feature=+crt-static"]

Build with cargo and use dumpbin /dependents to check that the resulting binary does not depend on MSVC CRT DLLs anymore.


Storage Options

Local

cachepot defaults to using local disk storage. You can set the CACHEPOT_DIR environment variable to change the disk cache location. By default it will use a sensible location for the current platform: ~/.cache/cachepot on Linux, %LOCALAPPDATA%\Parity\cachepot on Windows, and ~/Library/Caches/Parity.cachepot on MacOS.

The default cache size is 10 gigabytes. To change this, set CACHEPOT_CACHE_SIZE, for example CACHEPOT_CACHE_SIZE="1G".

S3

If you want to use S3 storage for the cachepot cache, you need to set the CACHEPOT_BUCKET environment variable to the name of the S3 bucket to use.

You can use AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY to set the S3 credentials. Alternately, you can set AWS_IAM_CREDENTIALS_URL to a URL that returns credentials in the format supported by the EC2 metadata service, and credentials will be fetched from that location as needed. In the absence of either of these options, credentials for the instance's IAM role will be fetched from the EC2 metadata service directly.

If you need to override the default endpoint you can set CACHEPOT_ENDPOINT. To connect to a minio storage for example you can set CACHEPOT_ENDPOINT=<ip>:<port>. If your endpoint requires TLS, set CACHEPOT_S3_USE_SSL=true.

You can also define a prefix that will be prepended to the keys of all cache objects created and read within the S3 bucket, effectively creating a scope. To do that use the CACHEPOT_S3_KEY_PREFIX environment variable. This can be useful when sharing a bucket with another application.

Redis

Set CACHEPOT_REDIS to a Redis url in format redis://[:<passwd>@]<hostname>[:port][/<db>] to store the cache in a Redis instance. Redis can be configured as a LRU (least recently used) cache with a fixed maximum cache size. Set maxmemory and maxmemory-policy according to the Redis documentation. The allkeys-lru policy which discards the least recently accessed or modified key fits well for the cachepot use case.

Redis over TLS is supported. Use the rediss:// url scheme (note rediss vs redis). Append #insecure the the url to disable hostname verification and accept self-signed certificates (dangerous!). Note that this also disables SNI.

Memcached

Set CACHEPOT_MEMCACHED to a Memcached url in format tcp://<hostname>:<port> ... to store the cache in a Memcached instance.

Google Cloud Storage

To use Google Cloud Storage, you need to set the CACHEPOT_GCS_BUCKET environment variable to the name of the GCS bucket. If you're using authentication, either set CACHEPOT_GCS_KEY_PATH to the location of your JSON service account credentials or CACHEPOT_GCS_CREDENTIALS_URL with a URL that returns the oauth token. By default, CACHEPOT on GCS will be read-only. To change this, set CACHEPOT_GCS_RW_MODE to either READ_ONLY or READ_WRITE.

Azure

To use Azure Blob Storage, you'll need your Azure connection string and an existing Blob Storage container name. Set the CACHEPOT_AZURE_CONNECTION_STRING environment variable to your connection string, and CACHEPOT_AZURE_BLOB_CONTAINER to the name of the container to use. Note that cachepot will not create the container for you - you'll need to do that yourself.

Important: The environment variables are only taken into account when the server starts, i.e. only on the first run.


Overwriting the cache

In situations where the cache contains broken build artifacts, it can be necessary to overwrite the contents in the cache. That can be achieved by setting the CACHEPOT_RECACHE environment variable.


Debugging

You can set the CACHEPOT_ERROR_LOG environment variable to a path and set CACHEPOT_LOG to get the server process to redirect its logging there (including the output of unhandled panics, since the server sets RUST_BACKTRACE=1 internally).

CACHEPOT_ERROR_LOG=/tmp/cachepot_log.txt CACHEPOT_LOG=debug cachepot

You can also set these environment variables for your build system, for example

CACHEPOT_ERROR_LOG=/tmp/cachepot_log.txt CACHEPOT_LOG=debug cmake --build /path/to/cmake/build/directory

Alternatively, if you are compiling locally, you can run the server manually in foreground mode by running CACHEPOT_START_SERVER=1 CACHEPOT_NO_DAEMON=1 cachepot, and send logging to stderr by setting the CACHEPOT_LOG environment variable for example. This method is not suitable for CI services because you need to compile in another shell at the same time.

CACHEPOT_LOG=debug CACHEPOT_START_SERVER=1 CACHEPOT_NO_DAEMON=1 cachepot

Interaction with GNU make jobserver

cachepot provides support for a GNU make jobserver. When the server is started from a process that provides a jobserver, cachepot will use that jobserver and provide it to any processes it spawns. (If you are running cachepot from a GNU make recipe, you will need to prefix the command with + to get this behavior.) If the cachepot server is started without a jobserver present it will create its own with the number of slots equal to the number of available CPU cores.

This is most useful when using cachepot for Rust compilation, as rustc supports using a jobserver for parallel codegen, so this ensures that rustc will not overwhelm the system with codegen tasks. Cargo implements its own jobserver (see the information on NUM_JOBS in the cargo documentation) for rustc to use, so using cachepot for Rust compilation in cargo via RUSTC_WRAPPER should do the right thing automatically.


Known Caveats

General

  • Absolute paths to files must match to get a cache hit. This means that even if you are using a shared cache, everyone will have to build at the same absolute path (i.e. not in $HOME) in order to benefit each other. In Rust this includes the source for third party crates which are stored in $HOME/.cargo/registry/cache by default.

Rust

  • Crates that invoke the system linker cannot be cached. This includes bin, dylib, cdylib, and proc-macro crates. You may be able to improve compilation time of large bin crates by converting them to a lib crate with a thin bin wrapper.
  • Incrementally compiled crates cannot be cached. By default, in the debug profile Cargo will use incremental compilation for workspace members and path dependencies. You can disable incremental compilation.

More details on Rust caveats

Dependencies

~26–46MB
~754K SLoC