Hermetic Build Tools Ecosystem
Compare and contrast major hermetic build systems: Bazel, Nix, Buck2, and their mental models.
Hermetic Build Tools Ecosystem
Master hermetic build systems with free flashcards and spaced repetition practice. This lesson covers reproducible build tools, containerization platforms, and package management strategiesβessential concepts for creating reliable, deterministic software builds that work consistently across any environment.
Welcome to Hermetic Builds
π» Building software that works identically on every machine isn't just a convenienceβit's a necessity for modern development teams. The hermetic build tools ecosystem provides the foundation for creating truly reproducible builds that eliminate the dreaded "works on my machine" problem.
A hermetic build is one that is completely self-contained and insulated from the host system. It depends only on explicitly declared inputs and produces bit-for-bit identical outputs regardless of when or where it runs. Think of it as a sealed laboratory experiment: every variable is controlled, every dependency is specified, and the results are perfectly predictable.
π Real-world analogy: Imagine baking a cake. A non-hermetic approach would be like saying "use flour from whatever's in your pantry, and bake until it looks done." A hermetic approach provides exact measurements, specifies the flour brand and type, controls oven temperature precisely, and times everything to the second. The result? The same perfect cake, every time.
Core Concepts
What Makes a Build Tool Hermetic?
For a build tool to achieve hermeticity, it must satisfy several critical properties:
1. Determinism π―
The same inputs always produce identical outputs. No timestamps, no random values, no system-dependent paths in build artifacts.
2. Isolation π
The build process cannot access unspecified resources from the host system. No reading from /usr/local/lib, no pulling in system Python packages, no implicit dependencies.
3. Declarative Dependencies π
Every dependency must be explicitly declared with exact versions. No "latest," no version ranges, no ambiguity.
4. Content Addressability π
Artifacts and dependencies are identified by cryptographic hashes of their content, not by names or versions that might change.
5. Caching and Incrementality β‘
Previously built artifacts can be reused safely when inputs haven't changed, dramatically speeding up builds.
The Hermetic Build Tools Landscape
| Tool Category | Purpose | Key Examples |
|---|---|---|
| Build Systems | Orchestrate compilation and artifact creation | Bazel, Buck2, Nix, Pants |
| Containerization | Provide isolated runtime environments | Docker, Podman, containerd |
| Package Managers | Resolve and lock dependencies | Nix, Guix, Spack |
| Remote Execution | Distribute builds across infrastructure | BuildBuddy, BuildBarn, Buildfarm |
| Content Stores | Store artifacts by content hash | CAS (Content Addressable Storage), OCI registries |
Bazel: The Gold Standard
Bazel originated at Google (as "Blaze") and represents the most mature hermetic build system available. It's designed to handle monorepos with millions of lines of code across multiple languages.
πΊ Key Bazel concepts:
- Workspace: The root directory containing all source code and a
WORKSPACEfile - Package: A directory containing a
BUILDfile that defines targets - Target: A buildable unit (library, binary, test) with explicit dependencies
- Rule: Template that defines how to build a specific type of target
- Action: Individual build step (compilation, linking, etc.)
π‘ Bazel's hermetic guarantees: Every action runs in a sandbox with only declared inputs accessible. Actions cannot write to arbitrary filesystem locations, access the network, or read system libraries unless explicitly permitted.
BAZEL BUILD PROCESS
π BUILD files β π Analysis Phase β π Action Graph
β
β
βοΈ Execution Phase
β
ββββββββββββββββββββββββββΌβββββββββββββββββββββββββ
β β β
ποΈ Sandbox 1 ποΈ Sandbox 2 ποΈ Sandbox 3
(Isolated) (Isolated) (Isolated)
β β β
ββββββββββββββββββββββββββΌβββββββββββββββββββββββββ
β
π¦ Build Artifacts
β
β
πΎ Content Store
Nix: Functional Package Management
Nix takes a radically different approach by treating package management as a purely functional problem. Every package is built from a "derivation"βa function that describes how to build the package from inputs.
π Core Nix principles:
- Immutability: Once built, packages never change. Updates create new packages with different hashes.
- Store paths: All packages live in
/nix/store/hash-name-version/wherehashis computed from build inputs. - Closures: Every package knows its complete dependency tree, making deployments completely self-contained.
- Rollbacks: Since old versions are never deleted (until garbage collection), you can instantly roll back to any previous system state.
NIX STORE STRUCTURE
/nix/store/
ββ a1b2c3...d4e5-glibc-2.35/
β ββ lib/
β ββ bin/
β ββ include/
β
ββ f6g7h8...i9j0-openssl-3.0.7/
β ββ lib/libssl.so β (references glibc by hash)
β ββ bin/openssl
β
ββ k1l2m3...n4o5-nginx-1.23.3/
ββ bin/nginx β (references openssl & glibc by hash)
ββ etc/nginx/
Each path is IMMUTABLE and SELF-CONTAINED
Nix expressions are written in a lazy, functional language that describes build processes:
{ stdenv, fetchurl, openssl }:
stdenv.mkDerivation {
pname = "myapp";
version = "1.0.0";
src = fetchurl {
url = "https://example.com/myapp-1.0.0.tar.gz";
sha256 = "0a1b2c3d...";
};
buildInputs = [ openssl ];
buildPhase = ''
gcc -o myapp main.c -lssl -lcrypto
'';
installPhase = ''
mkdir -p $out/bin
cp myapp $out/bin/
'';
}
π‘ Why the hash matters: The store path hash includes everything that could affect the build: source code, compiler version, compiler flags, library versions, and even environment variables. Change any input, and you get a different hashβa different package.
Buck2: Meta's Modern Build System
Buck2 is Meta's (Facebook's) next-generation build system, written in Rust and designed for extreme scalability. It learns from Bazel's success while adding modern features.
π Buck2 innovations:
- Starlark extensions: Like Bazel, uses Python-like Starlark for build rules, but with better performance
- Virtual filesystem: Build actions see a virtual view of the filesystem, enhancing hermeticity
- BXL: Buck Extension Language allows querying and manipulating the build graph
- Incremental analysis: Dramatically faster than Bazel at understanding what changed
Docker: Containerization for Builds
While Docker isn't strictly hermetic (it can access network, timestamps vary, layers can change), it provides a practical isolation layer that many teams use to approximate hermetic builds.
π³ Docker's role in hermetic builds:
- Reproducible base images: Using pinned image digests (
FROM ubuntu@sha256:abc123...) ensures consistent starting points - Multi-stage builds: Separate build-time dependencies from runtime dependencies
- BuildKit: Docker's newer builder includes better caching and more deterministic builds
β οΈ Docker's limitations:
- Timestamps in layers (can be worked around)
- Network access during build (unless disabled)
- Host system leakage (kernel version, DNS, etc.)
- Layer caching based on Dockerfile commands, not content
DOCKER BUILD STAGES (Multi-stage Pattern)
βββββββββββββββββββββββββββββββββββββββββββ
β STAGE 1: Builder β
β ββββββββββββββββββββββββββββββ β
β β FROM golang:1.20-alpine β β
β β COPY . /src β β
β β RUN go build -o app β β
β ββββββββββββββββββββββββββββββ β
β β β
β β β
β π¦ /app/app (binary only) β
βββββββββββΌββββββββββββββββββββββββββββββββ
β Copy artifact only
β
βββββββββββββββββββββββββββββββββββββββββββ
β STAGE 2: Runtime β
β ββββββββββββββββββββββββββββββ β
β β FROM alpine:3.17 β β
β β COPY --from=builder /app β β
β β CMD ["/app/app"] β β
β ββββββββββββββββββββββββββββββ β
β β
β Result: Minimal runtime image β
β (no Go compiler, no source code) β
βββββββββββββββββββββββββββββββββββββββββββ
Remote Execution: Scaling Hermetic Builds
Remote execution distributes build actions across a cluster of workers, dramatically reducing build times for large projects. The Remote Execution API (part of the Remote Build Execution protocol) is used by Bazel, Buck2, and others.
πΊ Remote execution architecture:
βββββββββββββββ ββββββββββββββββββββ
β CLIENT βββββββββββ SCHEDULER β
β (Bazel) β Sends β (Build Master) β
βββββββββββββββ Actions ββββββββββββββββββββ
β
βββββββββββββββββΌββββββββββββββββ
β β β
ββββββββββββ ββββββββββββ ββββββββββββ
β WORKER 1 β β WORKER 2 β β WORKER 3 β
β Executes β β Executes β β Executes β
β Actions β β Actions β β Actions β
ββββββ¬ββββββ ββββββ¬ββββββ ββββββ¬ββββββ
β β β
βββββββββββββββββΌββββββββββββββββ
β
ββββββββββββββββββββ
β CONTENT STORE β
β (CAS) β
β Artifacts β
ββββββββββββββββββββ
π‘ Why remote execution requires hermeticity: For a build action to run correctly on a remote worker, it must not depend on anything about that worker's host system. Every input must be explicitly provided, and the action must run in a sandbox. This is only possible with hermetic build systems.
Content Addressable Storage (CAS) is central to remote execution. Every file, directory, and artifact is stored by its SHA-256 hash. Workers download exactly the inputs they need (identified by hash), execute the action, and upload outputs (also by hash).
Examples with Detailed Explanations
Example 1: Simple Bazel Hermetic Build
Let's build a C++ library with explicit dependencies using Bazel:
Project structure:
myproject/
βββ WORKSPACE
βββ BUILD
βββ math.h
βββ math.cc
βββ main.cc
WORKSPACE (defines the hermetic boundary):
workspace(name = "myproject")
## Declare external dependencies with exact versions and checksums
load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
http_archive(
name = "googletest",
urls = ["https://github.com/google/googletest/archive/release-1.12.1.tar.gz"],
sha256 = "81964fe578e9bd7c94dfdb09c8e4d6e6759e19967e397dbea48d1c10e45d0df2",
strip_prefix = "googletest-release-1.12.1",
)
BUILD (defines hermetic build targets):
cc_library(
name = "math",
srcs = ["math.cc"],
hdrs = ["math.h"],
visibility = ["//visibility:public"],
)
cc_binary(
name = "calculator",
srcs = ["main.cc"],
deps = [":math"], # Explicit dependency on math library
)
cc_test(
name = "math_test",
srcs = ["math_test.cc"],
deps = [
":math",
"@googletest//:gtest_main", # External dependency
],
)
π What makes this hermetic?
- Explicit dependencies: The
calculatorbinary declares it needsmath. Bazel won't let it access anything else. - Checksummed externals: GoogleTest is fetched with a specific SHA-256 hash. If the downloaded file doesn't match, the build fails.
- Sandboxed execution: When Bazel compiles
main.cc, it runs the compiler in a sandbox that only containsmain.cc,math.h, and the declared toolchain. - Reproducible output: Running
bazel build //calculatorwill produce bit-for-bit identical binaries on any machine with the same Bazel version.
π‘ Building and verifying:
## First build
bazel build //:calculator
sha256sum bazel-bin/calculator
## Output: abc123def456...
## Clean and rebuild
bazel clean
bazel build //:calculator
sha256sum bazel-bin/calculator
## Output: abc123def456... (identical!)
Example 2: Nix Hermetic Environment
Let's create a development environment with exact package versions:
shell.nix:
{ pkgs ? import <nixpkgs> {
# Pin nixpkgs to specific commit for reproducibility
overlays = [];
}
}:
let
# Pin specific package versions by hash
pinnedPkgs = import (pkgs.fetchFromGitHub {
owner = "NixOS";
repo = "nixpkgs";
rev = "a7ecde854aee5c4c7cd6177f54a99d2c1ff28a31";
sha256 = "0qk1x7x0qn0l7b8x0w4l0c8w7l9z0m1w2r3y4z5v6x7w8q9a0b1c";
}) {};
in
pinnedPkgs.mkShell {
name = "hermetic-dev-env";
buildInputs = with pinnedPkgs; [
# Exact versions from pinned commit
gcc11
cmake
ninja
pkg-config
# Libraries with exact versions
openssl_3_0
zlib
curl
];
shellHook = ''
echo "Hermetic development environment activated"
echo "GCC version: $(gcc --version | head -n1)"
echo "OpenSSL version: $(openssl version)"
# Set environment variables for hermetic builds
export SSL_CERT_FILE="${pinnedPkgs.cacert}/etc/ssl/certs/ca-bundle.crt"
export NIX_SSL_CERT_FILE="$SSL_CERT_FILE"
'';
# Prevent environment pollution
LOCALE_ARCHIVE = "${pinnedPkgs.glibcLocales}/lib/locale/locale-archive";
}
π How Nix achieves hermeticity:
- Pinned nixpkgs: The
revspecifies an exact Git commit. Everyone using this shell.nix gets identical package definitions. - Content hashes: Each package in the Nix store has a path like
/nix/store/xyz123-gcc-11.3.0/. Thexyz123hash is computed from all build inputs. - Isolated build environment: Nix builds run with
$PATH,$LD_LIBRARY_PATH, and other variables controlled to prevent host system leakage. - Reproducible activation: Running
nix-shellon any machine with Nix will download or build the exact same packages.
π‘ Using the environment:
## Enter hermetic shell
nix-shell
## Build your project with exact tool versions
cmake -B build -G Ninja
ninja -C build
## Exit returns to normal system
exit
Result: Every developer on your team, regardless of their OS or installed packages, gets the exact same GCC version, OpenSSL version, and library versions. No more "works on my machine" debugging.
Example 3: Hermetic Docker Build with Reproducible Layers
Dockerfile with hermetic best practices:
## Pin base image by digest, not tag
FROM node:18.16.0-alpine@sha256:a1b2c3d4e5f6789...
## Ensure consistent timezone and locale
ENV TZ=UTC \
LANG=C.UTF-8 \
LC_ALL=C.UTF-8
## Install dependencies with exact versions
RUN apk add --no-cache \
python3=3.11.3-r0 \
make=4.4.1-r0 \
g++=12.2.1_git20220924-r4
## Set working directory
WORKDIR /app
## Copy package files first (layer caching optimization)
COPY package.json package-lock.json ./
## Install exact versions from lockfile
RUN npm ci --only=production \
&& npm cache clean --force
## Copy source code
COPY . .
## Build with reproducible flags
RUN SOURCE_DATE_EPOCH=1672531200 \
npm run build
## Remove timestamps from files for reproducibility
RUN find /app -exec touch -t 202301010000 {} +
## Run as non-root user
USER node
CMD ["node", "dist/server.js"]
Build script for maximum reproducibility (build.sh):
#!/bin/bash
set -euo pipefail
## Use BuildKit for better reproducibility
export DOCKER_BUILDKIT=1
## Build with reproducible settings
docker build \
--build-arg BUILDKIT_INLINE_CACHE=1 \
--build-arg SOURCE_DATE_EPOCH=$(date -d '2023-01-01' +%s) \
--progress=plain \
--no-cache \
-t myapp:hermetic \
.
## Verify reproducibility
echo "First build digest:"
docker images --digests myapp:hermetic
## Rebuild and compare
docker build \
--build-arg BUILDKIT_INLINE_CACHE=1 \
--build-arg SOURCE_DATE_EPOCH=$(date -d '2023-01-01' +%s) \
--progress=plain \
--no-cache \
-t myapp:hermetic-verify \
.
echo "Second build digest:"
docker images --digests myapp:hermetic-verify
echo "Comparing image contents..."
container-diff diff daemon://myapp:hermetic daemon://myapp:hermetic-verify --type=file --type=metadata
π³ Hermetic Docker strategies:
- Digest-based base images: Tags like
alpine:3.17can change, butalpine@sha256:abc...is immutable. - Pinned package versions:
apk add python3=3.11.3-r0instead ofpython3ensures exact versions. - npm ci not npm install:
npm ciusespackage-lock.jsonexactly;npm installmight update versions. - SOURCE_DATE_EPOCH: Sets a reproducible timestamp for build tools.
- Touch command: Removes filesystem timestamps that would differ between builds.
β οΈ Limitations: Even with these measures, Docker builds may vary due to:
- Network-fetched content (use
--network=noneafter dependencies are installed) - Kernel version in image metadata
- Build host architecture leakage
For true hermeticity, combine Docker with a hermetic build system like Bazel inside the container.
Example 4: Buck2 Rules for Polyglot Project
BUCK file for a project mixing Python and Rust:
## Load Buck2 rules
load("@prelude//python:python.bzl", "python_library", "python_binary")
load("@prelude//rust:rust.bzl", "rust_library", "rust_binary")
## Rust library with explicit dependencies
rust_library(
name = "dataprocessor",
srcs = glob(["src/lib.rs", "src/**/*.rs"]),
edition = "2021",
deps = [
"//third-party/rust:serde",
"//third-party/rust:serde_json",
],
# Hermetic flags for reproducible builds
rustc_flags = [
"-C", "opt-level=3",
"-C", "debuginfo=0",
"-C", "overflow-checks=on",
],
)
## Python library that wraps Rust via FFI
python_library(
name = "processor_py",
srcs = ["processor.py"],
deps = [
":dataprocessor", # Rust library
"//third-party/python:cffi",
],
base_module = "myapp",
)
## Final Python application
python_binary(
name = "app",
main = "main.py",
deps = [
":processor_py",
"//third-party/python:flask",
"//third-party/python:requests",
],
# Package everything needed for hermetic execution
package_style = "standalone",
)
Toolchain configuration (.buckconfig):
[python]
interpreter = /usr/bin/python3.11
package_style = standalone
[rust]
compiler = /opt/rust/1.70.0/bin/rustc
edition = 2021
[build]
# Enable hermetic sandboxing
execution_environments = true
[cache]
# Use remote cache for shared artifacts
mode = dir
dir = /tmp/buck-cache
dir_mode = readwrite
π Buck2's hermetic advantages:
- Cross-language dependencies: Python code can depend on Rust libraries cleanly, with Buck2 managing the FFI bridge.
- Sandboxed execution: Each build action runs in isolation with only declared inputs.
- Remote caching: Content-addressed artifacts can be shared across the team.
- Incremental builds: Buck2 tracks dependencies at fine granularity, rebuilding only what changed.
π‘ Building:
## Build with remote caching
buck2 build //app:app --remote-cache=grpc://cache.example.com:9092
## Run in hermetic mode
buck2 run //app:app
## Query dependency graph
buck2 query "deps(//app:app)" --output-attribute='^'
Common Mistakes
β οΈ Mistake 1: Using "latest" or version ranges
β Wrong:
FROM python:3-slim
RUN pip install flask requests
β Right:
FROM python:3.11.3-slim@sha256:abc123...
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
## requirements.txt has: flask==2.3.2\nrequests==2.31.0
Why it matters: "latest" tags move; tomorrow's "python:3-slim" might be Python 3.12 instead of 3.11, breaking your build.
β οΈ Mistake 2: Implicit system dependencies
β Wrong (Makefile):
build:
gcc -o myapp main.c -lssl
β Right (Bazel BUILD):
cc_binary(
name = "myapp",
srcs = ["main.c"],
deps = ["@openssl//:ssl"], # Explicit external dependency
)
Why it matters: The Makefile assumes OpenSSL is installed on the system. Different machines might have different OpenSSL versions (or none at all). Bazel forces you to declare the dependency explicitly.
β οΈ Mistake 3: Timestamps in build artifacts
β Wrong:
tar -czf release.tar.gz dist/
β Right:
tar --sort=name --mtime='2023-01-01 00:00:00' --owner=0 --group=0 -czf release.tar.gz dist/
Why it matters: Without normalized timestamps and metadata, the same source code produces different tarballs on different days or machines, breaking content-addressable caching.
β οΈ Mistake 4: Network access during builds
β Wrong (allowing builds to fetch from internet):
## setup.py
install_requires=[
'numpy>=1.20', # Fetches latest matching version at build time
]
β Right (lockfile with hashes):
## requirements.txt with hashes
numpy==1.24.3 \
--hash=sha256:ab344f1bf21f140adab8e47fdbc7c35a477dc01408791f8ba00d018dd0bc5155
Why it matters: Fetching from the network introduces non-determinism. The package available today might be different tomorrow (or unavailable).
β οΈ Mistake 5: Ignoring build tool versions
β Wrong:
## README: "Install Bazel and run 'bazel build'..."
bazel build //...
β Right:
## .bazelversion file
6.2.1
## Use Bazelisk (wrapper that enforces version)
bazelisk build //...
Why it matters: Different Bazel versions can produce different outputs. Bazelisk automatically downloads and uses the version specified in .bazelversion.
β οΈ Mistake 6: Mutable base layers in containers
β Wrong:
FROM ubuntu:22.04
β Right:
FROM ubuntu:22.04@sha256:a0d9e826ab87bd665cfc640598a871b748b4b70a01a4f3d174d4fb02adad07a9
Why it matters: Tags are mutable pointers. ubuntu:22.04 today might include different packages than ubuntu:22.04 next month after security updates.
Key Takeaways
π Quick Reference: Hermetic Build Principles
| Determinism | Same inputs β identical outputs, always |
| Isolation | No access to undeclared host resources |
| Explicit Dependencies | Every dependency pinned with exact version/hash |
| Content Addressing | Identify artifacts by cryptographic hash |
| Reproducible Metadata | Normalize timestamps, owners, permissions |
| No Network Access | Fetch dependencies before build, not during |
π― Tool Selection Guide:
| Use Case | Recommended Tool | Why |
|---|---|---|
| Large monorepo, multiple languages | Bazel or Buck2 | Battle-tested at Google/Meta scale |
| System-level reproducibility | Nix | Manages entire dependency graph including system packages |
| Containerized applications | Docker + BuildKit + hermetic base | Practical for teams already using containers |
| Scientific computing, HPC | Spack or Guix | Designed for reproducible research environments |
| Python projects | Pants or Bazel + rules_python | Hermetic Python packaging is challenging; these tools solve it |
π§ Memory device for hermetic properties (DICED):
- Determinism: Same in, same out
- Isolation: No host leakage
- Content addressing: Hash-based identity
- Explicit deps: No implicit assumptions
- Declarative: Describe what, not how
π‘ Getting started checklist:
- β Pin all dependency versions (no "latest" or ranges)
- β Use lockfiles (package-lock.json, Cargo.lock, etc.)
- β Specify exact tool versions (.bazelversion, rust-toolchain, etc.)
- β Use content hashes for external resources
- β Enable sandboxing in your build tool
- β Normalize timestamps and metadata in artifacts
- β Test reproducibility: build twice, compare outputs
- β Set up remote caching for team efficiency
π Further Study
Bazel Documentation - https://bazel.build/docs - Comprehensive guide to Bazel's hermetic build features, rules, and remote execution API
Nix Pills Tutorial Series - https://nixos.org/guides/nix-pills/ - Deep dive into Nix's functional approach to package management and system configuration
Reproducible Builds Project - https://reproducible-builds.org/ - Community documentation on achieving bit-for-bit reproducible builds across different tools and languages
π€ Did you know? Google builds over 2 billion lines of code in a single monorepo using Blaze (Bazel's internal predecessor). Every build is hermetic, enabling them to cache and reuse build artifacts across thousands of developers. This level of build hermeticity is why Google engineers rarely encounter "works on my machine" problemsβif it builds successfully once, it builds identically everywhere.