You are viewing a preview of this lesson. Sign in to start learning
Back to Hermetic Builds

Why Docker Alone Isn't Hermetic

Understand Docker's non-hermetic aspects: layer caching, build-time network access, and timestamps.

Why Docker Alone Isn't Hermetic

Master the critical distinctions between containerization and hermetic builds with free flashcards and spaced repetition practice. This lesson covers Docker's reproducibility limitations, non-hermetic dependencies, and runtime variationsβ€”essential concepts for building truly deterministic production systems.

Welcome 🎯

Many developers believe that wrapping their build process in a Docker container automatically makes it hermetic. Unfortunately, this widespread misconception leads to subtle bugs, inconsistent deployments, and difficult-to-debug production issues. While Docker provides isolation and portability, it doesn't guarantee reproducibility or determinismβ€”the core promises of hermetic builds.

In this lesson, we'll dissect exactly why Docker containers, despite their many benefits, fall short of achieving true hermetic build principles. You'll learn to identify the hidden sources of non-determinism that can creep into Dockerized builds and understand what additional measures are needed to achieve genuine build hermeticity.

Core Concepts πŸ’‘

What Docker Provides vs. What Hermetic Builds Require

To understand why Docker alone isn't hermetic, we must first clarify what each term means:

Docker provides:

  • Process isolation: Containers run in isolated namespaces
  • Filesystem isolation: Each container has its own filesystem view
  • Portability: "Works on my machine" becomes "works in this container"
  • Dependency bundling: Libraries and tools packaged together

Hermetic builds require:

  • Complete input specification: Every input must be explicitly declared
  • Deterministic outputs: Same inputs always produce identical outputs
  • No network access during build: Cannot fetch external resources
  • No timestamp dependencies: Build results independent of when built
  • No host system leakage: Host environment doesn't affect build
Property Docker Provides? Hermetic Build Requires?
Isolation βœ… Yes βœ… Yes
Reproducibility ⚠️ Partial βœ… Complete
Determinism ❌ No guarantee βœ… Required
Network isolation ❌ Network available βœ… No network access
Input declaration ⚠️ Partial (Dockerfile) βœ… Complete manifest

The Docker Build Process: Where Non-Determinism Creeps In

Let's examine a typical Dockerfile and identify the hermetic violations:

FROM node:18
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
RUN npm run build
CMD ["npm", "start"]

This innocent-looking Dockerfile contains at least 6 sources of non-determinism:

  1. Base image ambiguity: node:18 is a floating tag that can point to different images over time
  2. Network dependency: npm install fetches packages from the internet
  3. Version resolution: Without a lockfile, npm may resolve different dependency versions
  4. Timestamp inclusion: Build time gets embedded in artifacts
  5. Order-dependent operations: File copy order can affect build caching
  6. Build-time environment: NODE_ENV and other variables affect output
DOCKER BUILD PROCESS (Non-Hermetic)

    πŸ“‹ Dockerfile
         β”‚
         ↓
    🌐 Pull base image ──→ ⚠️ Tag may resolve differently
         β”‚
         ↓
    πŸ“¦ RUN npm install ──→ ⚠️ Network fetch (versions may change)
         β”‚
         ↓
    πŸ”¨ RUN build ──────→ ⚠️ Timestamp embedded
         β”‚
         ↓
    🎯 Output image ────→ ⚠️ Different each time!

Problem 1: Floating Tags and Base Image Drift 🏷️

When you specify FROM node:18, you're not pinning to a specific image. Tags are mutable pointers that can be updated:

## ❌ NON-HERMETIC: Tag can point to different images
FROM node:18

## ❌ STILL NON-HERMETIC: Even 'latest' is a moving target
FROM ubuntu:latest

## βœ… BETTER: Use digest pinning
FROM node:18@sha256:a5e1b7e7c6f9c8d3e2a1b4c5d6e7f8g9h0i1j2k3l4m5n6o7p8q9r0s1t2u3v4w5

## βœ… HERMETIC: Specify exact digest
FROM ubuntu@sha256:82becede498899ec668628e7cb0ad87b6e1c371cb8a1e597d83a47fac21d6af3

Why this matters:

  • Node 18.0.0 and 18.15.0 both satisfy node:18
  • Security patches update base images without changing tags
  • Different registry mirrors may serve different content
  • Builds from last month won't match today's builds

πŸ’‘ Memory device: "Floating tags are like saying 'use the latest red car'β€”you might get a sedan today and a truck tomorrow!"

Problem 2: Network Access During Build 🌐

Docker builds have unrestricted network access by default, allowing them to fetch resources that may change:

## ❌ NON-HERMETIC: Fetches latest packages from internet
RUN apt-get update && apt-get install -y curl

## ❌ NON-HERMETIC: Downloads from changing remote source
RUN curl https://example.com/install.sh | sh

## ❌ NON-HERMETIC: npm/pip/maven fetch from registries
RUN npm install
RUN pip install -r requirements.txt
RUN mvn package

Each of these operations can produce different results depending on:

  • When the build runs (packages get updated)
  • Where the build runs (geographic mirrors, registry availability)
  • What network state exists (DNS resolution, transient failures)

Hermetic builds require:

## βœ… HERMETIC: All dependencies pre-fetched and checksummed
COPY vendor/ /app/vendor/
COPY package-lock.json /app/
RUN npm ci --offline
NETWORK DEPENDENCY PROBLEM

    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚  Build #1   β”‚         β”‚  Build #2   β”‚
    β”‚  (Monday)   β”‚         β”‚  (Friday)   β”‚
    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
           β”‚                       β”‚
           ↓                       ↓
    🌐 npm registry         🌐 npm registry
           β”‚                       β”‚
    β”Œβ”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”
    β”‚ lodash 4.17.20β”‚       β”‚ lodash 4.17.21β”‚  ⚠️ Version bumped!
    β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
           β”‚                       β”‚
           ↓                       ↓
    πŸ“¦ Artifact A           πŸ“¦ Artifact B  ❌ Different outputs!

Problem 3: Implicit Dependencies and Host Leakage πŸ”“

Docker builds can inadvertently depend on the host system in subtle ways:

## ❌ Depends on host's DNS configuration
RUN npm config set registry https://internal-registry.company.local

## ❌ Depends on host's mounted volumes
RUN --mount=type=cache,target=/root/.npm npm install

## ❌ Depends on build-time secrets
RUN --mount=type=secret,id=github_token \
    git clone https://$(cat /run/secrets/github_token)@github.com/private/repo.git

## ❌ Depends on build arguments
ARG BUILD_VERSION
RUN echo "${BUILD_VERSION}" > version.txt

Host leakage examples:

Leakage Type Example Impact
Timestamp RUN date > built.txt Different on every build
User ID RUN chown $(id -u) /app Varies by build host
Hostname RUN echo $(hostname) > host.txt Differs across machines
Random numbers RUN openssl rand -hex 16 > key Non-deterministic
File ordering RUN tar czf backup.tar.gz * Filesystem-dependent

Problem 4: Layer Caching Assumptions πŸ“¦

Docker's layer caching improves build speed but can hide non-determinism:

FROM python:3.11

## This layer gets cached
COPY requirements.txt .

## ❌ Cache hit means old packages used
RUN pip install -r requirements.txt

## New code, but old dependencies!
COPY app.py .

The caching trap:

Developer A builds on Monday:

  1. pip install fetches requests==2.28.0 (latest at the time)
  2. Docker caches this layer with hash abc123

Developer B builds on Friday:

  1. requirements.txt unchanged β†’ cache hit on abc123
  2. Uses Monday's dependencies, not Friday's registry state
  3. Appears to work, but isn't hermetic
LAYER CACHING MASKING NON-DETERMINISM

    Monday Build              Friday Build
    ────────────              ────────────
    
    requirements.txt          requirements.txt
         β”‚ (hash: aaa)             β”‚ (hash: aaa)
         ↓                         ↓
    πŸ” Cache miss            πŸ” Cache HIT βœ“
         β”‚                         β”‚
         ↓                         ↓
    🌐 pip install           πŸ“¦ Use cached layer
    (gets newest)            (uses Monday's packages)
         β”‚                         β”‚
         ↓                         ↓
    Layer: abc123            Layer: abc123
    
    ⚠️ Both use same layer, but only Monday's was "fresh"
    ⚠️ False sense of consistency!

πŸ’‘ Tip: Use docker build --no-cache to detect hidden non-determinism, but this doesn't fix the underlying issues.

Problem 5: Build-Time Variability 🎲

Many build tools embed build-time metadata into artifacts:

## ❌ Timestamp embedded in binary
RUN go build -o app main.go

## ❌ Build date in version string
RUN npm run build  # Creates bundle-2024-01-15.js

## ❌ Git commit embedded (changes with every commit)
RUN javac -d bin src/*.java  # Includes build timestamp in .class files

Common sources of build-time variability:

  • Compiler timestamps: Many compilers embed build time in binaries
  • Archive ordering: tar, zip may order files by creation time
  • UUID/GUID generation: Build systems that generate unique identifiers
  • Parallelism: Multi-threaded builds with race conditions
  • Environment variables: HOME, USER, TMPDIR affect some tools

Example: Go binary timestamps

## ❌ NON-HERMETIC: Different timestamps
RUN go build -o server main.go

## βœ… HERMETIC: Reproducible builds with flags
RUN go build -trimpath -ldflags="-buildid=" -o server main.go

Comparing the binaries:

## Build 1 (Monday 10:00 AM)
$ go build -o server1 main.go
$ sha256sum server1
d4f3c2b1a...  # Hash includes timestamp

## Build 2 (Monday 10:01 AM) - same source!
$ go build -o server2 main.go
$ sha256sum server2
e5g4d3c2b...  # Different hash! ❌

## Hermetic build
$ go build -trimpath -ldflags="-buildid=" -o server3 main.go
$ sha256sum server3
a1b2c3d4e...  # Consistent hash βœ…

Problem 6: Dependency Version Resolution πŸ“š

Without lockfiles or exact version specifications, package managers make non-deterministic choices:

## ❌ NON-HERMETIC: Semantic versioning allows updates
COPY package.json .
RUN npm install  # Resolves ^1.2.3 differently over time

## ⚠️ BETTER: Use lockfile, but still fetches from network
COPY package-lock.json .
RUN npm ci  # More deterministic, but network-dependent

## βœ… HERMETIC: Vendored dependencies with checksums
COPY package-lock.json .
COPY node_modules/ ./node_modules/
RUN npm rebuild  # Only rebuilds native modules

Semantic versioning pitfall:

Specification Monday's Resolution Friday's Resolution Hermetic?
"express": "^4.17.0" 4.17.1 4.18.0 (new minor) ❌ No
"express": "~4.17.1" 4.17.1 4.17.3 (patch) ❌ No
"express": "4.17.1" 4.17.1 4.17.1 ⚠️ If available
"express": "4.17.1" + lockfile 4.17.1 + deps pinned 4.17.1 + deps pinned ⚠️ If registry stable
Vendored with checksums Exact local copy Exact local copy βœ… Yes

Transitive dependency problem:

Even with a lockfile for direct dependencies, transitive dependencies can change:

DEPENDENCY TREE INSTABILITY

    Your App
        β”‚
        β”œβ”€ express@4.17.1 (pinned)
        β”‚       β”‚
        β”‚       β”œβ”€ body-parser@^1.19.0  ⚠️ Caret allows updates!
        β”‚       β”‚       β”‚
        β”‚       β”‚       └─ bytes@3.1.0
        β”‚       β”‚
        β”‚       └─ cookie@0.4.0
        β”‚
        └─ lodash@4.17.21 (pinned)

    Monday: body-parser resolves to 1.19.0
    Friday: body-parser resolves to 1.20.1  ❌ Different build!

Examples with Detailed Explanations πŸ”

Example 1: The npm Install Trap

Let's trace through a real-world scenario showing how Docker builds become non-hermetic:

Scenario: You're building a Node.js microservice

## Dockerfile (seemingly reasonable)
FROM node:18
WORKDIR /app
COPY package.json .
RUN npm install
COPY src/ ./src/
RUN npm run build
CMD ["node", "dist/server.js"]

package.json:

{
  "dependencies": {
    "express": "^4.18.0",
    "axios": "^1.3.0"
  }
}

Timeline of builds:

Date What Happened Result
Jan 1 Initial build express@4.18.0, axios@1.3.0
Jan 15 Rebuild after code change express@4.18.1 (security patch), axios@1.3.2
Feb 1 Production deployment express@4.18.2, axios@1.3.4
Result ❌ Three different artifacts from identical source code!

The problem compounds:

## Developer's laptop (with Docker layer cache)
$ docker build -t app:v1 .
## Uses cached layer from last week β†’ old dependencies

## CI/CD pipeline (clean build)
$ docker build -t app:v1 .
## Fresh build β†’ fetches newest dependencies

## Result: Different images with same tag! 😱

How to fix it:

## 1. Use specific base image digest
FROM node@sha256:a5e1b7e7c6f9c8d3e2a1b4c5d6e7f8g9h0i1j2k3l4m5n6o7p8q9r0s1t2u3v4w5

WORKDIR /app

## 2. Copy lockfile for exact versions
COPY package.json package-lock.json ./

## 3. Use 'npm ci' which requires lockfile
RUN npm ci --only=production

## 4. Copy source
COPY src/ ./src/

## 5. Build with reproducible settings
RUN SOURCE_DATE_EPOCH=0 npm run build

CMD ["node", "dist/server.js"]

Example 2: The Base Image Surprise

A team discovers their "identical" builds produce different security scan results:

Initial Dockerfile:

FROM python:3.11-slim
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY app.py .
CMD ["python", "app.py"]

What happened:

## Build in January
$ docker build -t myapp:latest .
$ docker images --digests | grep myapp
myapp  latest  sha256:abc123...  # Python 3.11.1, OpenSSL 3.0.7

## Rebuild in February (no code changes!)
$ docker build -t myapp:latest .
$ docker images --digests | grep myapp
myapp  latest  sha256:def456...  # Python 3.11.2, OpenSSL 3.0.8

## Security scanner results:
## January: 3 vulnerabilities
## February: 0 vulnerabilities  ⚠️ Same code, different scan!

Why this happened:

The python:3.11-slim tag is updated whenever:

  • Python releases a patch version (3.11.1 β†’ 3.11.2)
  • Debian (base OS) updates packages
  • Security patches are applied
  • Maintainers rebuild with updated dependencies
BASE IMAGE TAG EVOLUTION

January:  python:3.11-slim β†’ sha256:abc123 (3.11.1)
             β”‚
             β”‚ (Debian security update)
             ↓
February: python:3.11-slim β†’ sha256:def456 (3.11.2)
             β”‚
             β”‚ (Python patch release)
             ↓
March:    python:3.11-slim β†’ sha256:ghi789 (3.11.3)

⚠️ Same tag, completely different images!

The hermetic fix:

## Lock to specific digest
FROM python:3.11-slim@sha256:abc123def456...

## Better: Specify exact Python version in your own minimal base
FROM debian:bullseye-20230109@sha256:xyz789...
RUN apt-get update && apt-get install -y \
    python3.11=3.11.1-1 \
    && rm -rf /var/lib/apt/lists/*

COPY requirements.txt .
RUN pip install --no-deps -r requirements.txt
COPY app.py .
CMD ["python3.11", "app.py"]

Example 3: The Timestamp Embedding Issue

A Go application's builds are never identical, even from the same source:

Dockerfile:

FROM golang:1.21
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN go build -o server .
CMD ["./server"]

Investigation:

## Build 1
$ docker build -t goapp:v1 .
$ docker run --rm goapp:v1 ./server --version
Version: 1.0.0, Built: 2024-01-15 14:32:11
$ sha256sum server
4f8d3c2b1a9e7f6d5c4b3a2e1d0c9b8a...  server

## Build 2 (one minute later, same source)
$ docker build -t goapp:v2 .
$ docker run --rm goapp:v2 ./server --version
Version: 1.0.0, Built: 2024-01-15 14:33:22
$ sha256sum server
5g9e4d3c2b1a0f8e7d6c5b4a3e2d1c0b...  server

## ❌ Different hashes!

Root causes:

  1. Build timestamp embedded:
// main.go
var buildTime = time.Now().Format(time.RFC3339)

func main() {
    fmt.Printf("Built at: %s\n", buildTime)
    // ❌ Different on every build!
}
  1. Go build ID: Go embeds a build ID based on content hashes AND timestamps:
$ go tool buildid server
HdYZP5xI8c9_oU9Kq3sW/...  # Changes every build

The hermetic solution:

FROM golang:1.21@sha256:...
WORKDIR /app

COPY go.mod go.sum ./
RUN go mod download

COPY . .

## Hermetic build flags:
RUN CGO_ENABLED=0 go build \
    -trimpath \
    -ldflags="-buildid= -w -s" \
    -o server .

CMD ["./server"]

Explanation of flags:

  • -trimpath: Remove filesystem paths from binary
  • -buildid=: Disable build ID generation
  • -w: Omit DWARF symbol table
  • -s: Omit symbol table and debug info
  • CGO_ENABLED=0: Disable C dependencies (ensures static binary)

Verification:

## Build 1
$ docker build -t goapp:hermetic .
$ sha256sum server
a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6...  server

## Build 2 (hours later)
$ docker build -t goapp:hermetic .
$ sha256sum server
a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6...  server

## βœ… Identical hashes!

Example 4: Multi-Stage Build Gotchas

Multi-stage builds can hide non-hermetic behavior across stages:

## ❌ NON-HERMETIC multi-stage build
FROM maven:3.9-eclipse-temurin-17 AS builder
WORKDIR /app
COPY pom.xml .
RUN mvn dependency:go-offline  # ⚠️ Fetches from Maven Central
COPY src/ ./src/
RUN mvn package  # ⚠️ May fetch additional plugins

FROM eclipse-temurin:17-jre  # ⚠️ Floating tag
COPY --from=builder /app/target/*.jar app.jar
CMD ["java", "-jar", "app.jar"]

Hidden problems:

  1. Stage 1 issues:

    • maven:3.9-eclipse-temurin-17 is a moving target
    • mvn dependency:go-offline doesn't actually work offline (ironic!)
    • Maven plugins can be updated without changing POM
  2. Stage 2 issues:

    • eclipse-temurin:17-jre updates frequently
    • JRE version affects runtime behavior
  3. Cross-stage issues:

    • File timestamps carry over
    • JAR manifest includes build timestamp

Hermetic multi-stage build:

## Stage 1: Build (hermetic)
FROM maven:3.9-eclipse-temurin-17@sha256:abc123... AS builder
WORKDIR /app

## Copy dependency manifests
COPY pom.xml settings.xml ./

## Explicitly copy vendored dependencies
COPY .m2/repository /root/.m2/repository/

## Build with offline mode
RUN mvn -o package -DskipTests

COPY src/ ./src/
RUN mvn -o package \
    -Dproject.build.outputTimestamp=1980-01-01T00:00:00Z

## Stage 2: Runtime (hermetic)
FROM eclipse-temurin:17-jre@sha256:def456...

COPY --from=builder /app/target/app.jar /app.jar

ENTRYPOINT ["java", "-jar", "/app.jar"]

Key improvements:

  • Digest-pinned base images in both stages
  • Maven runs in offline mode (-o)
  • Dependencies pre-vendored in .m2/repository
  • Fixed timestamp via project.build.outputTimestamp

Common Mistakes ⚠️

Mistake 1: Assuming "It Works in Docker" Means Hermetic

❌ Wrong thinking:

"I containerized my build, so it's reproducible!"

βœ… Reality: Docker provides isolation, not hermeticity. A build can be isolated yet still fetch different dependencies, use different tool versions, or embed timestamps.

Example of isolated-but-not-hermetic:

FROM ubuntu:22.04
RUN apt-get update && apt-get install -y build-essential
COPY code.c .
RUN gcc -o app code.c

This runs in isolation but:

  • ubuntu:22.04 tag can point to different images
  • apt-get update fetches latest package lists
  • gcc version varies
  • Build timestamp embedded in binary

Mistake 2: Relying on Docker Layer Caching for Consistency

❌ Wrong approach:

"My builds are consistent because Docker caches the dependency layer!"

βœ… Problem: Cache hits mask non-determinism. When cache misses occur (new machine, cleared cache, changed dependency file), the non-hermetic nature surfaces.

Dangerous pattern:

COPY requirements.txt .
RUN pip install -r requirements.txt  # Cached = old deps, uncached = new deps

Mistake 3: Using latest or Floating Tags

❌ Anti-pattern:

FROM node:latest
FROM python:3
FROM ubuntu

βœ… Best practice:

FROM node:18.15.0@sha256:abc123...
FROM python:3.11.2-slim@sha256:def456...
FROM ubuntu:22.04@sha256:ghi789...

Mistake 4: Ignoring Transitive Dependencies

❌ Incomplete solution:

"I pinned my direct dependencies, so I'm good!"

βœ… Reality: Transitive (indirect) dependencies can still vary:

// package.json
{
  "dependencies": {
    "express": "4.18.1"  // βœ… Pinned
  }
}

But express depends on:

  • body-parser: ^1.19.0 ← ⚠️ Still floating!
  • cookie: 0.5.0 ← βœ… Exact
  • debug: ~2.6.9 ← ⚠️ Allows patches

Solution: Use lockfiles and verify with npm ci or equivalent.

Mistake 5: Forgetting About Build Tools

❌ Overlooked issue: Even if dependencies are pinned, build tools themselves can vary:

FROM node:18
RUN npm install -g webpack  # ⚠️ Gets latest webpack!
COPY . .
RUN webpack build

βœ… Solution:

FROM node:18@sha256:...
COPY package.json package-lock.json ./
RUN npm ci  # Installs webpack from lockfile
COPY . .
RUN npx webpack build  # Uses local webpack

Mistake 6: Not Addressing Timestamps

❌ Hidden non-determinism: Many tools embed timestamps without developers realizing:

  • Go: Build timestamp in binary
  • Java: JAR manifest timestamp
  • Python: .pyc file timestamps
  • Node: Source maps with build time
  • Tar archives: File modification times

βœ… Solution: Use tool-specific flags:

Tool Problem Solution
Go Build timestamp -ldflags="-buildid="
Maven JAR timestamps project.build.outputTimestamp
Webpack Source map timestamps output.hashFunction: 'sha256'
Tar File mtimes tar --mtime='1970-01-01' -czf
GCC Debug info timestamps -Wl,--build-id=none

Mistake 7: Mixing Build and Runtime Concerns

❌ Confused Dockerfile:

FROM node:18
COPY . .
RUN npm install  # Runtime dependencies
RUN npm run build  # Build-time action
CMD ["npm", "start"]  # Runtime command

This conflates:

  • Build-time dependencies (webpack, typescript)
  • Runtime dependencies (express, database drivers)
  • Development dependencies (testing frameworks)

βœ… Clearer separation:

## Build stage: hermetic build environment
FROM node:18@sha256:... AS builder
COPY package*.json ./
RUN npm ci --include=dev
COPY . .
RUN npm run build

## Runtime stage: minimal runtime
FROM node:18-alpine@sha256:...
COPY package*.json ./
RUN npm ci --only=production
COPY --from=builder /app/dist ./dist
CMD ["node", "dist/server.js"]

Key Takeaways 🎯

πŸ“‹ Quick Reference: Docker vs. Hermetic Builds

Aspect Docker Provides Hermetic Requires
Base Images Floating tags (mutable) Digest pinning (immutable)
Dependencies Network fetch allowed Pre-vendored, checksummed
Build Tools Latest versions Exact versions specified
Timestamps Current time embedded Fixed or removed
Randomness Allowed (UUIDs, etc.) Deterministic only
Cache Opportunistic layer reuse Content-addressed storage

The Core Principle

Docker answers: "Will this run the same way in different environments?" Hermetic builds answer: "Will this produce byte-for-byte identical outputs?"

These are different guarantees. Docker solves the runtime portability problem; hermetic builds solve the build reproducibility problem.

Path to Hermeticity

To make Docker builds hermetic, you must:

  1. Pin everything with digests

    • Base images: FROM image@sha256:...
    • Downloaded files: Verify checksums
    • Lock file integrity: Commit lockfiles to source control
  2. Eliminate network dependencies

    • Vendor all dependencies locally
    • Use --network=none during build (when possible)
    • Pre-fetch and cache everything
  3. Remove timestamps and randomness

    • Use SOURCE_DATE_EPOCH environment variable
    • Tool-specific flags to disable timestamp embedding
    • Deterministic ordering for file operations
  4. Declare all inputs explicitly

    • No implicit dependencies on host system
    • No build arguments that vary
    • All tools and versions specified
  5. Verify reproducibility

    • Build twice, compare outputs byte-for-byte
    • Use tools like diffoscope to identify differences
    • Implement CI checks for reproducibility

When Is Docker "Hermetic Enough"?

In practice, perfect hermeticity is expensive. Consider your needs:

Full hermeticity needed when:

  • Building security-critical software
  • Requiring supply chain verification
  • Need cryptographic proof of build integrity
  • Multiple parties must verify identical builds

Docker isolation sufficient when:

  • Rapid iteration is priority
  • Reproducibility within days/weeks is acceptable
  • Team has shared build infrastructure
  • Lockfiles provide "good enough" consistency

πŸ’‘ Pragmatic approach: Start with Docker + lockfiles + digest pinning. Add full hermeticity (vendoring, timestamp fixes) only where needed.

Tools That Help

To achieve true hermetic builds, consider:

  • Bazel: Purpose-built for hermetic builds
  • Nix: Functional package manager with reproducibility
  • BuildKit: Docker's build engine with improved caching
  • Kaniko: Builds container images without Docker daemon
  • Buildpacks: Opinionated, reproducible builds
  • reprotest: Tests for reproducibility issues
  • diffoscope: Deep comparison tool for build artifacts

πŸ“š Further Study


Remember: Docker is an excellent tool for isolation and portability, but achieving reproducibility requires additional discipline around versioning, dependency management, and build determinism. Understanding these distinctions makes you a more effective DevOps engineer! πŸš€