Inputs, Outputs, and Dependencies
Master the concept of build graphs, explicit dependency declaration, and content-addressable storage.
Inputs, Outputs, and Dependencies in Hermetic Builds
Master hermetic builds with free flashcards and spaced repetition practice. This lesson covers input declaration, output artifacts, dependency management, and reproducibilityβessential concepts for building reliable, deterministic software systems. Understanding these fundamentals will help you create builds that produce identical results every time, regardless of where or when they run.
Welcome π―
Welcome to the world of hermetic builds! If you've ever experienced the frustration of "it works on my machine" only to watch your code fail in production, you're about to discover the solution. Hermetic builds are the gold standard for reproducible, deterministic software construction.
In this lesson, we'll explore the three pillars that make hermetic builds possible:
- Inputs - Everything your build needs to run
- Outputs - The artifacts your build produces
- Dependencies - The relationships between build components
By the end of this lesson, you'll understand how to design build systems that behave predictably, scale efficiently, and eliminate the mystery from your build process. Let's dive in! π
Core Concepts π
What Makes a Build Hermetic? π
A hermetic build is completely isolated from its environment. Think of it like a sealed laboratory experimentβno external factors can contaminate the results. The build:
- β Declares ALL inputs explicitly
- β Produces consistent outputs
- β Doesn't access the network during execution
- β Doesn't read ambient environment state
- β Runs identically on any machine
π‘ Memory Device: Remember HEROIC builds:
- Hermetically sealed
- Explicit inputs
- Reproducible outputs
- Offline execution
- Isolated environment
- Consistent results
Understanding Inputs π₯
Inputs are everything your build consumes to produce outputs. In a hermetic build, inputs must be:
1. Explicit and Declared
Every input must be formally specified. No hidden dependencies!
| Input Type | Examples | How to Declare |
|---|---|---|
| Source Files | *.java, *.cpp, *.ts files | List in build file or use glob patterns |
| Dependencies | Libraries, frameworks, tools | Version-pinned package declarations |
| Configuration | Build flags, compiler options | Build rule parameters |
| Data Files | Test fixtures, assets, schemas | Resource declarations in build system |
| Toolchains | Compilers, linkers, interpreters | Tool platform specifications |
2. Content-Addressable
Inputs should be identified by their content hash, not by location or timestamp. This ensures that identical content always produces identical results.
ββββββββββββββββββββββββββββββββββββββββββ β TRADITIONAL BUILD (non-hermetic) β ββββββββββββββββββββββββββββββββββββββββββ€ β Input: /usr/lib/libfoo.so β β Problem: File could change! β β Result: β Non-deterministic β ββββββββββββββββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββββββββββ β HERMETIC BUILD β ββββββββββββββββββββββββββββββββββββββββββ€ β Input: sha256:a3f5b9... β β Guarantees: Exact version locked β β Result: β Deterministic β ββββββββββββββββββββββββββββββββββββββββββ
3. Immutable
Once declared, inputs cannot change. If you need a different version, you declare a new input with a different identifier.
πΊ Key Principle: "Same inputs β Same outputs, always"
Understanding Outputs π€
Outputs are the artifacts your build produces. In hermetic builds, outputs must be:
1. Deterministic
Given identical inputs, the build must produce byte-for-byte identical outputs. This means:
- β No timestamps embedded in artifacts
- β No random IDs or UUIDs
- β No dependency on build order (when parallelized)
- β No host-specific paths in binaries
| Output Type | Examples | Determinism Requirements |
|---|---|---|
| Binaries | executables, .so, .dll files | Strip timestamps, use deterministic linking |
| Archives | .jar, .tar, .zip files | Sort entries, normalize timestamps |
| Documents | Generated docs, reports | Remove creation dates, use fixed ordering |
| Containers | Docker images, OCI artifacts | Reproducible layers, fixed base images |
2. Isolated Output Directories
Each build action writes to its own isolated output directory. No shared state!
BUILD OUTPUT ISOLATION βββββββββββββββββββββββββββββββββββββββ β bazel-out/ β β ββ target1/ β β β ββ binary1 β β β ββ lib1.so β β β β β ββ target2/ β β β ββ binary2 β β β ββ lib2.so β β β β β ββ target3/ β β ββ archive.tar β βββββββββββββββββββββββββββββββββββββββ Each target has its own sandbox! No cross-contamination possible.
3. Cacheable
Because outputs are deterministic, they can be cached and reused. If inputs haven't changed, skip the build and use the cached output!
π‘ Benefits of Deterministic Outputs:
- Fast incremental builds - Only rebuild what changed
- Distributed caching - Share artifacts across team
- Easy debugging - Reproduce exact builds from past
- Reliable CI/CD - No flaky builds
Understanding Dependencies π
Dependencies define the relationships between build targets. In hermetic builds, dependencies must be:
1. Explicitly Declared
Every dependency relationship must be stated in the build definition.
DEPENDENCY GRAPH
ββββββββββββ
β app β β Final binary
ββββββ¬ββββββ
β
ββββββββ΄βββββββ
β β
βββββΌββββ ββββββΌβββββ
β lib_a β β lib_b β β Libraries
βββββ¬ββββ ββββββ¬βββββ
β β
βββββΌββββ ββββββΌβββββ
β util β β proto β β Utilities
βββββββββ βββββββββββ
All arrows must be declared!
No implicit dependencies allowed.
2. Version-Pinned
Dependencies must specify exact versions, not ranges:
β Non-hermetic: requests>=2.0
β
Hermetic: requests==2.28.1 with hash verification
| Dependency Declaration Style | Hermetic? | Why? |
|---|---|---|
lodash: ^4.0.0 |
β No | Allows any 4.x version - non-deterministic |
lodash: latest |
β No | Changes over time |
lodash: 4.17.21 |
β οΈ Partial | Better, but version could be deleted/retagged |
lodash@sha256:a3f5b9... |
β Yes | Content hash ensures exact artifact |
3. Acyclic
Dependency graphs must be directed acyclic graphs (DAGs). No circular dependencies!
VALID DAG INVALID CYCLE
A β B β C A β B
β β β β
D β E β F D β C
β οΈ Circular!
β
Can be built β Cannot be built
4. Transitively Closed
If A depends on B, and B depends on C, then A has a transitive dependency on C. The build system must handle this automatically:
TRANSITIVE CLOSURE
You declare: Build system resolves:
app β lib_a app β lib_a β util
β
lib_b β proto
All transitive dependencies
automatically included!
π‘ Pro Tip: Modern build systems like Bazel, Buck, and Pants handle transitive dependency resolution automatically. You only declare direct dependencies.
The Hermetic Build Contract π
Think of hermetic builds as a contract between the build system and the developer:
π The Hermetic Contract
| Developer Promises | Build System Guarantees |
|---|---|
| β Declare all inputs explicitly | β Reproducible outputs |
| β Pin all dependency versions | β Fast incremental builds |
| β Specify complete dependency graph | β Correct parallel execution |
| β No network/filesystem access in rules | β Distributed caching |
| β Deterministic build actions | β Build verification |
Examples π
Example 1: Non-Hermetic vs Hermetic Python Build
Let's see the difference between a traditional build and a hermetic build:
β Non-Hermetic Approach:
## requirements.txt
requests>=2.0
numpy
pandas>1.0
## Build process
$ pip install -r requirements.txt
$ python build.py
Problems:
requests>=2.0could resolve to 2.28.1 today, 2.29.0 tomorrownumpywith no version gets latest (changes over time)pandas>1.0allows any newer version- Network access required during build
- Different results on different machines or times
β Hermetic Approach:
## requirements.lock (generated from requirements.txt)
requests==2.28.1 \
--hash=sha256:7c5599b102feddaa661c826c56ab4fee28bfd17f5abca1ebbe3e7f19d7c97983
numpy==1.24.2 \
--hash=sha256:003a9f530e880cb2cd177cba1af7220b9aa42def9c4afc2a2fc3ee6be7eb2b22
pandas==1.5.3 \
--hash=sha256:74a3fd7e5a7ec052f183273dc7b0acd3a863edf7520f5d3a1765c04ffdb3b0b1
## BUILD file (Bazel example)
py_binary(
name = "my_app",
srcs = ["main.py", "utils.py"],
deps = [
requirement("requests"),
requirement("numpy"),
requirement("pandas"),
],
)
Benefits:
- Exact versions with content hashes
- Reproducible on any machine
- Can be cached and reused
- Dependencies fetched once, reused forever
Example 2: Hermetic Java Build with Bazel
Here's a complete example showing inputs, outputs, and dependencies:
## BUILD file
java_library(
name = "calculator",
srcs = ["Calculator.java"], # INPUT: Source files
deps = [ # INPUT: Dependencies
"@maven//:com_google_guava_guava",
],
)
java_test(
name = "calculator_test",
srcs = ["CalculatorTest.java"], # INPUT: Test sources
test_class = "com.example.CalculatorTest",
deps = [
":calculator", # DEPENDENCY: On library above
"@maven//:junit_junit", # INPUT: Test framework
],
)
java_binary(
name = "calculator_app",
main_class = "com.example.Main",
srcs = ["Main.java"],
deps = [":calculator"], # DEPENDENCY: On library
)
What happens:
Input Resolution: Bazel identifies all inputs:
- Source files:
Calculator.java,CalculatorTest.java,Main.java - External deps: Guava (specific version), JUnit (specific version)
- Toolchain: JDK version (declared elsewhere)
- Source files:
Dependency Graph Construction:
calculator_app β calculator β guava
β
calculator_test ββββββ
β
junit
Output Generation: Each target produces outputs:
calculatorβcalculator.jar(library)calculator_testβ test results (pass/fail)calculator_appβ executable binary
Caching: Outputs are cached by input hash. If
Calculator.javadoesn't change,calculator.jaris reused from cache!
Example 3: Build Input Hash Calculation
How does the build system determine if inputs changed? Through hash calculation:
| Step | Action | Result |
|---|---|---|
| 1 | Hash all source files | src_hash = sha256(Calculator.java) |
| 2 | Hash all dependencies | dep_hash = sha256(guava.jar) |
| 3 | Hash build configuration | cfg_hash = sha256("java_library...") |
| 4 | Combine hashes | input_hash = sha256(src + dep + cfg) |
| 5 | Look up in cache | If found β reuse output! β‘ |
Example scenario:
## First build
$ bazel build //src:calculator
Building... (2.5s)
Target //src:calculator.jar built
Input hash: a3f5b912...
[Cached output]
## Change Calculator.java
$ vim Calculator.java # Add a comment
## Rebuild
$ bazel build //src:calculator
Building... (2.4s) # Rebuilds because input hash changed
New input hash: b8d3e421...
## Build again without changes
$ bazel build //src:calculator
0.1s # Instant! Uses cached output
π€ Did you know? Google performs over 1 billion builds per day using Bazel. Hermetic builds with distributed caching make this scale possible!
Example 4: Handling System Dependencies
What about dependencies on system tools like compilers? These must also be hermetic!
β Non-Hermetic Approach:
## Uses whatever GCC is installed
$ gcc -o myapp main.c
Problem: GCC version varies by machine (Ubuntu 20.04 has GCC 9, Ubuntu 22.04 has GCC 11). Non-deterministic!
β Hermetic Approach:
## BUILD file
cc_binary(
name = "myapp",
srcs = ["main.c"],
toolchains = [
"@bazel_tools//tools/cpp:toolchain_type",
],
)
## WORKSPACE file - Pin exact toolchain
http_archive(
name = "gcc_linux_x86_64",
urls = [
"https://mirror.example.com/gcc-11.2.0-linux-x86_64.tar.gz",
],
sha256 = "d4f6d3a5c1e8b12f...", # Content hash
)
register_toolchains(
"//toolchains:gcc_11_2_0_toolchain",
)
Benefits:
- Exact compiler version specified
- Same compiler used on all machines
- Compiler cached and reused
- Results are deterministic
HERMETIC TOOLCHAIN RESOLUTION
ββββββββββββββββββββββββββββββββββββββ
β Build Request: compile main.c β
ββββββββββββββ¬ββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββ
β Resolve Toolchain β
β Need: C++ compiler β
ββββββββββββββ¬ββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββ
β Find Registered Toolchain β
β gcc-11.2.0 @ sha256:d4f6d3... β
ββββββββββββββ¬ββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββ
β Download (if not cached) β
β Extract to sandbox β
ββββββββββββββ¬ββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββ
β Execute Build Action β
β ./gcc-11.2.0/bin/gcc main.c β
ββββββββββββββ¬ββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββ
β Output: myapp (deterministic!) β
ββββββββββββββββββββββββββββββββββββββ
Common Mistakes β οΈ
Let's examine frequent pitfalls when implementing hermetic builds:
Mistake 1: Embedding Timestamps
β Wrong:
## Adds build timestamp to output
def build():
timestamp = datetime.now()
with open("version.txt", "w") as f:
f.write(f"Built at {timestamp}")
Every build produces different output, breaking caching!
β Correct:
## Use commit hash or version tag instead
def build():
commit_hash = os.environ.get("BUILD_SCM_REVISION")
with open("version.txt", "w") as f:
f.write(f"Version: {commit_hash}")
Mistake 2: Reading Environment Variables
β Wrong:
## Build script reads $HOME
OUTPUT_DIR=$HOME/build/output
mkdir -p $OUTPUT_DIR
cp artifact.jar $OUTPUT_DIR/
Different users have different $HOME values. Non-hermetic!
β Correct:
## All paths relative to build root
cc_binary(
name = "app",
srcs = ["main.c"],
# Output automatically goes to bazel-out/
)
Mistake 3: Network Access During Build
β Wrong:
## Downloads file during build
def generate_config():
response = requests.get("https://api.example.com/config")
return response.json()
Network content changes, build becomes non-deterministic.
β Correct:
## Fetch as declared input
http_file(
name = "config_json",
urls = ["https://api.example.com/config/v1.2.3.json"],
sha256 = "a3f5b912...", # Verified!
)
genrule(
name = "process_config",
srcs = ["@config_json//file"],
outs = ["config.processed"],
cmd = "process.py $< > $@",
)
Mistake 4: Undeclared Dependencies
β Wrong:
java_binary(
name = "app",
srcs = ["Main.java"],
deps = [
":library_a",
],
)
## Main.java actually uses classes from library_b!
## But it's transitively available through library_a
This works until library_a removes its dependency on library_b. Then your build breaks!
β Correct:
java_binary(
name = "app",
srcs = ["Main.java"],
deps = [
":library_a",
":library_b", # Declare direct dependency!
],
)
Mistake 5: Non-Deterministic Ordering
β Wrong:
## Iterating over set (unordered)
files = set(["a.txt", "b.txt", "c.txt"])
for f in files:
process(f) # Order varies between runs!
β Correct:
## Sort for deterministic ordering
files = ["a.txt", "b.txt", "c.txt"]
for f in sorted(files):
process(f) # Always same order
Key Takeaways π
Let's consolidate what you've learned about hermetic builds:
Core Principles
Inputs Must Be:
- β Explicitly declared
- β Content-addressable (hashed)
- β Immutable (version-pinned)
- β Complete (no hidden dependencies)
Outputs Must Be:
- β Deterministic (same inputs β same outputs)
- β Isolated (separate directories)
- β Cacheable (reusable across machines)
- β Verifiable (content hashes)
Dependencies Must Be:
- β Explicitly declared
- β Version-pinned (no ranges)
- β Acyclic (DAG structure)
- β Transitively resolved (automatic)
Benefits of Hermetic Builds
| Benefit | Why It Matters |
|---|---|
| π― Reproducibility | Same code always builds the same way |
| β‘ Speed | Aggressive caching of unchanged components |
| π Scalability | Distributed builds and remote caching |
| π Debuggability | Reproduce exact builds from the past |
| π Security | Content hashes verify dependency integrity |
| π€ Collaboration | Works the same on everyone's machine |
When to Use Hermetic Builds
β Ideal For:
- Large monorepos with many developers
- Projects requiring high reproducibility
- CI/CD pipelines
- Regulated industries (finance, healthcare)
- Open source projects (reproducible releases)
β οΈ Consider Carefully For:
- Small personal projects (overhead may not be worth it)
- Prototypes and experiments
- Projects with unavoidable system dependencies
π§ Try This: Check Your Build's Hermeticity
Run this test to see if your build is hermetic:
## Build twice and compare outputs
$ build_tool clean && build_tool build
$ sha256sum output/artifact.jar > hash1.txt
$ build_tool clean && build_tool build
$ sha256sum output/artifact.jar > hash2.txt
$ diff hash1.txt hash2.txt
## No differences? β
Build is deterministic!
## Differences found? β Build is non-hermetic
π Quick Reference Card
| Concept | Key Points |
|---|---|
| Inputs | Explicit, content-hashed, immutable, complete |
| Outputs | Deterministic, isolated, cacheable, verifiable |
| Dependencies | Declared, version-pinned, acyclic (DAG), transitive |
| Hermetic = | Same inputs β Same outputs, always |
| No Access To | Network, ambient environment, timestamps, randomness |
| Benefits | Reproducible, fast, scalable, debuggable, secure |
| Tools | Bazel, Buck, Pants, Nix, Gradle (with restrictions) |
| Verification | Content hashes (SHA-256) on all inputs & outputs |
π Further Study
Ready to dive deeper into hermetic builds? Check out these resources:
Bazel Documentation - Build Encyclopedia: https://bazel.build/reference/be/overview - Comprehensive guide to hermetic build concepts and Bazel's implementation
Reproducible Builds Project: https://reproducible-builds.org/ - Community effort to make software build processes deterministic, with tools and best practices
Google's Software Engineering at Scale (Chapter on Build Systems): https://abseil.io/resources/swe-book/html/ch18.html - Deep dive into how Google uses hermetic builds for massive scale
Congratulations! π You now understand the fundamental concepts of hermetic builds. Practice declaring explicit inputs, creating deterministic outputs, and managing dependencies properly. Your builds will become faster, more reliable, and easier to debug. Happy building! π»