1 unstable release
0.0.0 | Sep 28, 2023 |
---|
#24 in #btrfs
165KB
3.5K
SLoC
A re-imagined OCI image builder.
- Take advantage of native snapshot/diff/overlay functionality of filesystems. Cheap calculation of multiple changesets/layers in a build history enable more granular layers.
- Parallel builds based on a dataflow graph.
- Selectively add, remove, mix-and-match arbitrary base layers below the current build task. Forget about amalgamation images to support your mixed toolchains, apply tools from multiple pre-built images one after another.
- Define custom image manifests. Unlocked via flexible build tool layers, manifest files are built by the configuration via 'just another' step in a task. Select layers with your own code logic, cross-build multi-platform images to your hearts content, and more.
How to use
-
You will need a fresh BTRFs subvolume mounted and owned by your current user. Additionally, unprivileged_userns_clone should be enabled and the kernel compiled with support for userns.
# mount -t btrfs -o rw,space_cache,user_subvol_rm_allowed,noacl,noatime,subvol=/stromatekt /dev/sdx /home/stromatekt btrfs filesystem df /home/stromatekt cat /proc/sys/kernel/unprivileged_userns_clone | grep 1 cat /proc/config.gz | gunzip -c | grep CONFIG_USER_NS=y
-
Create
~/.config/stromatekt/config.json
with the path to the subvolume mount adjusted accordingly. It should look similar to:{ "btrfs_root": "/home/stromatekt" }
-
Prepare the example binary:
pushd examples/prime && cargo build --release && popd
-
Execute the example build:
cargo run -- ./examples/parallel-dependency.json --no-dry-run
Motivation
docker build
is slow. The structure of a Dockerfile
only permits a linear
sequence of instructions. Moreover, docker compose
is even slower. It will
send, unpack, repack layers of images and local file system a lot. This can
take a significant amount of time. The author has observed builds, with
Dockerfile
containing a single line of adding one link in the file system,
taking >4 minutes. This is unacceptable as development latency. Further,
caching of layers is inextricably bad due to the linear sequence logic. Let's
address both.
Structure of an OCI file
The main data within an OCI container is an ordered collection of layers. Each
layer is essentially a diff of the last, usually in the form of a tar
archive. (For slightly surprising reasons, a deletion is encoded as a file with
special naming rules).
When running a build, the builder will checkout the layers of the underlying container, run its commands, and finally find the diff to encode into a new layer. The two highly expensive filesystem tasks—checkout and diff—can be implemented much more efficiently if we can utilize the checkpoint and incremental diff logic of the filesystem itself.
Furthermore, this task is probably IO-bound. Meaning, we should seek to perform much of it in parallel wherever possible. Note that the layer sequence of an OCI image is not commutative. However, as long as the task definition itself opts-in by providing a canonical recombination order there shouldn't be any reproducibility problem from creating layers via a different order.
Example:
A --(proc0)-> B0
yielding diffC0
A --(proc1)-> B1
yielding diffC1
- => export layers as:
[A, C0, C1]
Actually, we could even allow swapping A
for a totally unrelated A*
as long
as the build manifest makes this explicit. For instance, to provide a security
patch of an underlying layer. Also, proc0
and proc1
can be executed with
entirely different underlying technologies (i.e. one as a x86 process, one a
WASI executable).
Planned extensions
- Library files for build dependencies and maintainability. Define additional tasks in a separate file, then import specific changesets they define into another specification and let the dataflow resolver figure out a solution.
- Reproducibility assertions via hashes, used for incremental builds.
Dependencies
~15–29MB
~442K SLoC