You are viewing a preview of this lesson. Sign in to start learning
Back to Hermetic Builds

What is Hermeticity?

Define hermetic builds and understand the isolation principles that separate them from traditional builds.

What is Hermeticity?

Master hermetic builds with free flashcards and spaced repetition practice. This lesson covers the core principles of hermeticity, practical applications in build systems, and common pitfalls that break reproducibilityβ€”essential concepts for modern software engineering and DevOps.

Welcome πŸ—οΈ

Imagine running the same build command twice and getting different outputs each time. Sounds like a nightmare, right? That's exactly what hermetic builds prevent. In software engineering, hermeticity refers to the property of build systems being completely self-contained and reproducibleβ€”like a hermetically sealed container that keeps external contaminants out.

The term "hermetic" comes from Hermes Trismegistus, a legendary Hellenistic figure associated with alchemy and the art of creating airtight seals. In modern computing, a hermetic build is one that's sealed off from the unpredictable external environment, ensuring that the same inputs always produce identical outputs.

This lesson will take you deep into the world of hermetic builds, showing you why they matter, how to achieve them, and what mistakes to avoid. Whether you're building microservices, mobile apps, or infrastructure-as-code, understanding hermeticity is crucial for creating reliable, debuggable, and maintainable systems.

Core Concepts πŸ’‘

The Fundamental Definition

Hermeticity in build systems means that a build process:

  1. Depends only on declared inputs - No hidden dependencies on system state, environment variables (unless explicitly declared), or network resources
  2. Produces identical outputs - Given the same inputs, the build generates byte-for-byte identical artifacts every time
  3. Is isolated from the host environment - The build doesn't rely on tools, libraries, or configurations installed on the build machine
  4. Is reproducible across machines and time - You can run the same build today, tomorrow, or on a different continent and get the same result

🎯 Key Principle

A hermetic build is a pure function: Build(inputs) β†’ outputs with no side effects and no reliance on external state.

Why Hermeticity Matters 🎯

Hermetic builds solve critical problems in software development:

1. Reproducibility πŸ”„

  • Debug issues by reproducing the exact build that failed
  • Roll back to previous versions with confidence
  • Verify that security patches don't introduce changes

2. Reliability βœ…

  • Eliminate "works on my machine" syndrome
  • Reduce flaky builds caused by environmental differences
  • Catch dependency issues early

3. Scalability πŸ“ˆ

  • Enable aggressive caching (same inputs = cache hit)
  • Distribute builds across multiple machines safely
  • Parallelize build steps without coordination overhead

4. Security πŸ”’

  • Audit exactly what goes into your artifacts
  • Prevent supply chain attacks from unexpected dependencies
  • Verify build integrity through checksums

The Hermetic Build Spectrum

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         DEGREES OF HERMETICITY                         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ”΄ Non-Hermetic                          βœ… Fully Hermetic
│──────────────────┼──────────────────┼─────────────────│
β”‚                  β”‚                  β”‚                 β”‚
β–Ό                  β–Ό                  β–Ό                 β–Ό

"make"         "docker build"    "bazel build"   "nix build"
Uses system    Fixed OS          All deps        Content-
libraries      but can fetch     declared        addressed
               from network      explicitly      everything

❌ Breaks       ⚠️ Mostly        βœ… Hermetic     βœ… Maximally
often           works            by design       hermetic

Most build systems fall somewhere on this spectrum. Achieving perfect hermeticity is challenging, but even moving toward the right side dramatically improves build quality.

The Three Pillars of Hermeticity πŸ›οΈ

Pillar Description Common Violations
πŸ”’ Isolation Build runs in a controlled sandbox with no access to host system resources Reading /etc/hosts, using system Python, accessing $HOME
πŸ“‹ Declaration All dependencies, tools, and inputs are explicitly listed and versioned Implicit dependencies, "latest" tags, unversioned tools
🎯 Determinism Same inputs always produce bit-identical outputs Timestamps in artifacts, random UUIDs, non-deterministic compression

Inputs and Outputs: The Contract πŸ“œ

A hermetic build system maintains a strict contract between inputs and outputs:

Declared Inputs:

  • πŸ“„ Source code files (with exact versions/commits)
  • πŸ“¦ Dependencies (pinned versions, checksummed)
  • πŸ”§ Build tools (specific versions in containers/sandboxes)
  • βš™οΈ Configuration files (checked into version control)
  • 🌐 Environment variables (explicitly declared)

Forbidden Inputs:

  • ❌ System-installed libraries or tools
  • ❌ Network resources fetched during build
  • ❌ Current date/time (unless explicitly needed and declared)
  • ❌ Ambient environment variables
  • ❌ User-specific paths or credentials
  • ❌ Random number generators (unless seeded deterministically)

Expected Outputs:

  • 🎁 Build artifacts (binaries, archives, images)
  • πŸ“Š Build metadata (logs, timing info)
  • πŸ§ͺ Test results
  • πŸ“ Documentation

Determinism Deep Dive πŸ”

Determinism is often the trickiest aspect of hermeticity. Many build steps introduce non-determinism accidentally:

Source of Non-determinism Why It Happens Solution
⏰ Timestamps Build embeds current time in artifacts Use SOURCE_DATE_EPOCH environment variable
πŸ“ File ordering Directory iteration order is filesystem-dependent Sort files alphabetically before processing
🎲 Hash randomization Python, Ruby hash tables use random seeds Set PYTHONHASHSEED=0 or equivalent
πŸ”€ Parallel builds Race conditions in concurrent operations Ensure proper dependency ordering, atomic writes
πŸ†” UUIDs/random IDs Generating unique identifiers Use content-based hashing instead
πŸ—œοΈ Compression Some algorithms include timestamps or vary by CPU Use deterministic compression (gzip -n, tar --sort=name)

πŸ’‘ Pro Tip: Use tools like diffoscope to compare two supposedly identical builds and find sources of non-determinism.

Caching and Hermeticity πŸš€

One of the biggest benefits of hermetic builds is aggressive caching:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         HERMETIC BUILD CACHING                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

  Input Hash               Cache Lookup
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”             β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Source   β”‚             β”‚          β”‚
β”‚ Code     │──┐          β”‚  Cache   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚          β”‚  Server  β”‚
              β”œβ”€β†’ Hash───→│          β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚          β”‚          β”‚
β”‚ Deps     β”‚β”€β”€β”˜          β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                  β”‚
                              β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    ↓                   ↓
              Cache Hit           Cache Miss
              (return            (run build,
               cached             store result)
               output)

With hermetic builds:

  • Content-addressable storage: Cache key = hash of all inputs
  • Distributed caching: Share build artifacts across team/CI
  • Incremental builds: Only rebuild what changed
  • Remote execution: Send build to powerful remote servers

Real-World Examples 🌍

Let's examine concrete scenarios that illustrate hermeticity in action.

Example 1: The Non-Hermetic Python Build ❌

Scenario: A team has a Python application with this build script:

#!/bin/bash
## build.sh - NON-HERMETIC VERSION

pip install -r requirements.txt
python setup.py build
python -m pytest
tar -czf app.tar.gz dist/

Why it's not hermetic:

  1. Undefined Python version: Uses whatever python is on the PATH (could be 3.8, 3.9, 3.11...)
  2. Unpinned dependencies: requirements.txt contains:
    flask>=2.0
    requests
    
    These will fetch different versions on different days!
  3. System pip: Uses system-installed pip (version varies)
  4. Timestamp in tarball: tar -czf embeds creation time
  5. Ambient pytest: Uses whatever pytest is installed

Consequences:

  • Developer A builds on Monday with Flask 2.0.1
  • Developer B builds on Friday with Flask 2.3.0 (just released)
  • Different behavior, different bugs, different security profiles
  • CI/CD might produce different artifacts than local builds

Example 2: The Hermetic Python Build βœ…

Improved version:

#!/bin/bash
## build.sh - HERMETIC VERSION

## Use exact Python version from container
docker run --rm -v $(pwd):/workspace \
  python:3.11.2-slim \
  /bin/bash -c '
    cd /workspace
    
    # Install exact pinned versions
    pip install --no-cache-dir -r requirements-lock.txt
    
    # Run build
    python setup.py build
    
    # Test with deterministic settings
    PYTHONHASHSEED=0 python -m pytest
    
    # Create deterministic tarball
    tar --sort=name --mtime="2023-01-01 00:00:00" \
        --owner=0 --group=0 --numeric-owner \
        -czf app.tar.gz dist/
  '

With requirements-lock.txt:

flask==2.0.1
requests==2.28.1
werkzeug==2.0.1
click==8.0.1
## ... all transitive dependencies pinned

Why it's hermetic:

  1. βœ… Fixed Python: python:3.11.2-slim is a specific, immutable image
  2. βœ… Pinned dependencies: Exact versions including transitive deps
  3. βœ… Isolated environment: Docker container provides clean sandbox
  4. βœ… Deterministic tarball: Timestamps, ordering, ownership all fixed
  5. βœ… Deterministic tests: PYTHONHASHSEED=0 prevents hash randomization

Example 3: Bazel - Hermetic by Design πŸ—οΈ

Google's Bazel build system enforces hermeticity through its architecture:

## BUILD.bazel
py_library(
    name = "mylib",
    srcs = ["mylib.py"],
    deps = [
        "@pypi//flask:pkg",  # External dependency
        "@pypi//requests:pkg",
    ],
)

py_binary(
    name = "myapp",
    srcs = ["main.py"],
    deps = [":mylib"],
)
## WORKSPACE - declares external dependencies
load("@rules_python//python:pip.bzl", "pip_install")

pip_install(
    name = "pypi",
    requirements = "//requirements-lock.txt",
)

Bazel's hermetic guarantees:

  1. Sandbox execution: Each build action runs in a filesystem sandbox
  2. Explicit dependencies: If not declared in deps, it's not available
  3. Content-addressed cache: Outputs cached by hash of inputs
  4. Toolchain management: Even compilers/interpreters are hermetic inputs
  5. Remote execution: Can run builds on remote servers transparently
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         BAZEL HERMETIC ARCHITECTURE             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

    Build Request                  Sandbox
         β”‚                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β–Ό                    β”‚  Declared    β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”               β”‚  inputs only β”‚
    β”‚ Action  │──────────────→│              β”‚
    β”‚ (build) β”‚   Copy inputs β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜               β”‚  β”‚ Build  β”‚  β”‚
         β”‚                    β”‚  β”‚ Action β”‚  β”‚
         β”‚                    β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
         β–Ό                    β”‚              β”‚
    Compute Hash              β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
         β”‚                    β”‚  β”‚ Output β”‚  β”‚
         β–Ό                    β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”               β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
    β”‚  Cache  β”‚β†β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
    β”‚  Lookup β”‚    Copy outputs
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
    β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”
    ↓          ↓
  Hit        Miss
  (reuse)    (execute)

Example 4: Docker - Partial Hermeticity ⚠️

Docker is often mistaken for being hermetic, but it requires discipline:

❌ Non-hermetic Dockerfile:

FROM ubuntu:latest
RUN apt-get update && apt-get install -y python3
COPY requirements.txt .
RUN pip3 install -r requirements.txt
COPY . .
CMD ["python3", "app.py"]

Problems:

  • ubuntu:latest changes over time
  • apt-get install fetches latest packages
  • pip3 install without pinned versions
  • Network access during build

βœ… More hermetic Dockerfile:

## Pin base image by digest
FROM ubuntu@sha256:abcd1234...

## Install specific versions
RUN apt-get update && \
    apt-get install -y \
    python3=3.11.2-1 \
    python3-pip=22.0.2+dfsg-1 && \
    rm -rf /var/lib/apt/lists/*

## Copy and install pinned deps
COPY requirements-lock.txt .
RUN pip3 install --no-cache-dir -r requirements-lock.txt

## Copy source
COPY . .

CMD ["python3", "app.py"]

Improvements:

  • Image pinned by SHA256 (immutable)
  • Explicit package versions
  • Locked dependencies
  • Cache cleared to reduce variability

πŸ€” Did you know? Even with these improvements, Docker builds aren't fully hermetic because they can access the network and depend on external registries. Tools like Nix and Guix go further by content-addressing everything.

Common Mistakes 🚨

Let's explore the most frequent ways developers accidentally break hermeticity:

Mistake 1: Using "Latest" Tags ⚠️

The problem:

FROM node:latest
FROM python:3

These tags change over time:

  • node:latest might be 18.x today, 20.x tomorrow
  • python:3 could be 3.9, 3.10, 3.11...

The fix:

FROM node:18.16.0-alpine3.17
FROM python:3.11.2-slim-bullseye
## Even better: pin by SHA256
FROM node@sha256:abcd1234...

Mistake 2: Fetching Dependencies During Build ⚠️

The problem:

## In build script
npm install
go get ./...
wget https://example.com/asset.zip

Network access introduces:

  • Version drift
  • Availability issues (registry down = build fails)
  • Security risks (man-in-the-middle attacks)

The fix:

  • Vendor dependencies: Check them into your repository
  • Use lock files: package-lock.json, go.sum, Pipfile.lock
  • Content-addressed storage: Bazel's http_archive with sha256

Mistake 3: Reading System Environment Variables ⚠️

The problem:

## In application code
import os
config = os.environ.get('DATABASE_URL')
api_key = os.getenv('API_KEY')

This makes the build depend on the builder's environment!

The fix:

  • Declare required env vars explicitly in build config
  • Use configuration files checked into version control
  • Inject at runtime, not build time (for secrets)

Mistake 4: Timestamps in Artifacts ⚠️

The problem:

zip -r app.zip dist/
tar -czf release.tar.gz bin/
## Both embed current timestamp!

The fix:

## zip with fixed timestamp
TZ=UTC zip -rX app.zip dist/

## tar with fixed mtime
tar --sort=name \
    --mtime="2023-01-01 00:00:00" \
    --owner=0 --group=0 \
    -czf release.tar.gz bin/

## Or use SOURCE_DATE_EPOCH
export SOURCE_DATE_EPOCH=1672531200
tar -czf release.tar.gz bin/

Mistake 5: Implicit Tool Dependencies ⚠️

The problem:

build:
    gcc -o myapp main.c
    strip myapp

This assumes:

  • gcc is installed
  • Specific version/configuration
  • strip utility is available

The fix:

## Declare exact toolchain
FROM gcc:12.2.0-bullseye AS builder
WORKDIR /build
COPY . .
RUN gcc -o myapp main.c && strip myapp

Mistake 6: File System Ordering ⚠️

The problem:

import os
files = os.listdir('src/')
for f in files:  # Order is non-deterministic!
    process(f)

Filesystem iteration order varies by OS, filesystem type, and even kernel version.

The fix:

import os
files = sorted(os.listdir('src/'))  # Explicit sort
for f in files:
    process(f)

Key Takeaways 🎯

Let's consolidate everything you've learned about hermeticity:

The Core Principles πŸ“Œ

  1. Same inputs β†’ Same outputs (always, everywhere)
  2. Declare everything explicitly (no hidden dependencies)
  3. Isolate from the host (containers, sandboxes, VMs)
  4. Make it deterministic (eliminate randomness and timestamps)
  5. Enable caching (content-addressed storage)

Benefits You'll Gain ✨

Benefit Why It Matters
πŸ› Easier debugging Reproduce any build exactly as it was
πŸš€ Faster builds Aggressive caching, incremental builds
πŸ‘₯ Team consistency Everyone gets identical artifacts
πŸ”’ Security Audit supply chain, verify integrity
πŸ“ˆ Scalability Distribute builds, remote execution

Hermetic Build Checklist βœ…

Use this before declaring a build hermetic:

  • All dependencies pinned to exact versions
  • Build runs in isolated container/sandbox
  • No network access during build (or only to declared, checksummed resources)
  • No reading of system environment variables (except explicitly declared)
  • Timestamps handled deterministically (SOURCE_DATE_EPOCH)
  • File ordering explicit (sort before processing)
  • Compression/archiving uses deterministic flags
  • Build tools versioned and declared
  • Tested on different machines/environments
  • Output artifacts bit-for-bit identical across runs

Tools for Hermetic Builds πŸ”§

Tool Approach Best For
Bazel Hermetic by design, sandbox execution Large monorepos, polyglot projects
Nix Functional package manager, content-addressed System-level reproducibility
Docker Containerization (requires discipline) Quick wins, microservices
Buck2 Hermetic, remote execution Meta's open-source Bazel alternative
Pants Hermetic Python/Go/etc builds Monorepos with strong Python focus

Mental Models 🧠

Mnemonic: H.E.R.M.E.T.I.C.

  • Hash all inputs
  • Explicitly declare dependencies
  • Reproduce anywhere
  • Make it deterministic
  • Eliminate network access
  • Toolchains under control
  • Isolate from host
  • Cache aggressively

Analogy: Recipe vs. Meal Kit 🍳

  • Non-hermetic build = Recipe saying "add some flour, use fresh eggs"
  • Hermetic build = Meal kit with exact measured ingredients, tools included, step-by-step instructions

Next Steps πŸŽ“

To deepen your understanding:

  1. Try it yourself: Take an existing project and make its build hermetic
  2. Compare builds: Use diffoscope to find non-determinism
  3. Read tool docs: Explore Bazel or Nix documentation
  4. Join the community: reproducible-builds.org has resources and discussions

πŸ“‹ Quick Reference Card

HermeticitySame inputs always produce identical outputs
Three PillarsIsolation, Declaration, Determinism
Key EnemiesTimestamps, network, file ordering, ambient env vars
Best ToolsBazel (general), Nix (system-level), Docker (with discipline)
Golden RuleIf you can't reproduce it, you don't control it
Quick WinPin all dependency versions, run builds in Docker

πŸ“š Further Study

Deepen your knowledge with these resources:

  1. Bazel Documentation on Hermetic Builds - https://bazel.build/concepts/hermeticity - Official guide from Google's Bazel team explaining hermetic principles and implementation

  2. Reproducible Builds Project - https://reproducible-builds.org/ - Community effort with tools, guides, and best practices for achieving bit-for-bit reproducible builds

  3. Nix Package Manager Manual - https://nixos.org/manual/nix/stable/ - Deep dive into purely functional package management and system-level hermeticity

Congratulations! You now understand what hermeticity means, why it matters, and how to achieve it in your build systems. The journey to fully hermetic builds takes time, but every step toward hermeticity makes your software more reliable, debuggable, and maintainable. πŸŽ‰