Bazel Mental Model
Understand Bazel's approach: explicit BUILD files, action-based execution, and remote caching.
Bazel Mental Model
Master Bazel's unique approach to hermetic builds with free flashcards and spaced repetition practice. This lesson covers dependency graphs, build actions, caching mechanisms, and execution environmentsβessential concepts for understanding how Bazel achieves reproducible builds at scale.
Welcome to the Bazel Mental Model π»
Bazel isn't just another build toolβit's a complete paradigm shift in how we think about building software. Unlike traditional build systems that execute scripts sequentially, Bazel operates on a declarative model where you describe what you want to build, not how to build it. Understanding this fundamental mental model is the key to unlocking Bazel's power for hermetic, reproducible, and scalable builds.
Think of traditional build tools like Make or Maven as recipe books where you write step-by-step instructions. Bazel, on the other hand, is more like a smart logistics system that figures out the optimal way to get from raw ingredients to finished product, caching intermediate results and parallelizing work automatically.
Core Concepts: The Building Blocks of Bazel's Mind π§
The Dependency Graph: Bazel's Map of Reality πΊοΈ
At the heart of Bazel's mental model is the dependency graphβa complete, explicit representation of every file, rule, and relationship in your build. This isn't just metadata; it's the foundation of everything Bazel does.
βββββββββββββββββββββββββββββββββββββββββββββββ β DEPENDENCY GRAPH STRUCTURE β βββββββββββββββββββββββββββββββββββββββββββββββ€ β β β //app:binary (final) β β β β β ββββββββ΄βββββββ β β βΌ βΌ β β //app:lib //third_party:json β β β β β β βΌ βΌ β β //utils:helpers (external) β β β β β βΌ β β //base:core β β β βββββββββββββββββββββββββββββββββββββββββββββββ
Key properties of Bazel's dependency graph:
- Explicit: Every dependency must be declared. No hidden includes or classpath magic.
- Acyclic: No circular dependencies allowedβBazel enforces a strict directed acyclic graph (DAG).
- Fine-grained: Dependencies are at the target level, not just file or module level.
- Hermetic: The graph captures everything needed, including toolchains and execution requirements.
π‘ Mental Model Tip: Think of Bazel's dependency graph like a restaurant's ingredient supply chain. Every dish (target) lists exactly which ingredients (dependencies) it needs, where they come from (repositories), and how they're prepared (rules). If an ingredient changes, only the dishes using it need to be remade.
Targets: The Atoms of Your Build π―
A target is the fundamental unit in Bazel's world. Targets come in three flavors:
| Target Type | Purpose | Example Label |
|---|---|---|
| Files | Source inputs or generated outputs | //src:main.java |
| Rules | Instructions for creating outputs from inputs | //app:server |
| Package Groups | Visibility and access control | //visibility:public |
Every target has a unique label in the format //package/path:target_name. This label is how Bazel references and tracks everything in your build.
Example labels:
//src/main/java/com/example:app- A Java application target//third_party/protobuf:protobuf_java- An external dependency//config:dev.json- A configuration file:local_target- Relative reference within the same package
Rules: The Transformation Functions βοΈ
A rule is like a pure function in functional programmingβit takes inputs (source files, dependencies) and produces outputs (compiled binaries, generated files) in a deterministic way.
βββββββββββββββββββββββββββββββββββββββββββββββ β RULE EXECUTION β βββββββββββββββββββββββββββββββββββββββββββββββ€ β β β π₯ INPUTS β β ββ Source files (srcs) β β ββ Dependencies (deps) β β ββ Data files (data) β β ββ Tool dependencies (tools) β β β β β β β βοΈ RULE (e.g., java_binary) β β β β β β β π€ OUTPUTS β β ββ Executable binary β β ββ Deploy JAR β β ββ Metadata files β β β βββββββββββββββββββββββββββββββββββββββββββββββ
Critical rule properties:
- Deterministic: Same inputs always produce same outputs
- Hermetic: No access to system state, network, or undeclared files
- Parallelizable: Rules can execute concurrently if dependencies allow
- Cacheable: Outputs can be stored and reused across builds and machines
Actions: The Actual Work Units π¨
When Bazel executes a rule, it generates one or more actionsβthe actual commands that run. This separation between rule (what to build) and action (how to build) is crucial to Bazel's mental model.
| Action Type | Description | Example |
|---|---|---|
| Spawn | Execute a command | javac Main.java -o Main.class |
| FileWrite | Generate a text file | Write manifest or config |
| TemplateExpand | Substitute variables in template | Version stamps |
| SymlinkTree | Create directory structure | Runfiles trees |
Each action has:
- Input files (must be outputs of other actions or source files)
- Command and environment (toolchain, flags, environment variables)
- Output files (created by the action)
- Mnemonic (human-readable label like "Javac" or "CppCompile")
π€ Did You Know? Bazel can execute hundreds of thousands of actions in a single build. The Android codebase at Google regularly has builds with over 1 million actions!
The Sandbox: Enforcing Hermeticity π
Bazel's sandbox is where the magic of hermetic builds happens. When an action executes, Bazel creates an isolated environment that only contains:
- Declared inputs: Only files explicitly listed as inputs or dependencies
- Toolchain binaries: The compiler, linker, or other tools needed
- Output directory: Where the action can write its results
What's NOT in the sandbox:
- β Your home directory
- β System libraries (except explicitly declared)
- β Network access
- β Other build outputs not declared as dependencies
- β Environment variables (except hermetic ones)
βββββββββββββββββββββββββββββββββββββββββββββββ β SANDBOX ENVIRONMENT β βββββββββββββββββββββββββββββββββββββββββββββββ€ β β β β Allowed: β β π execroot/ β β ββ declared_input1.java β β ββ declared_input2.java β β ββ toolchain/ β β ββ javac β β β β β Forbidden: β β π« /usr/lib/random_library.so β β π« /home/user/.config β β π« Network requests β β π« /tmp/cached_state β β β βββββββββββββββββββββββββββββββββββββββββββββββ
π‘ Pro Tip: If your build mysteriously fails with "file not found" errors in Bazel but works locally, you've probably found an undeclared dependency. The sandbox is catching your hermeticity violation!
Caching: The Three-Level System πΎ
Bazel's caching system is what makes it blazingly fast. It operates at three levels:
| Cache Level | Scope | Key Benefit |
|---|---|---|
| Local Action Cache | Your machine | Instant rebuilds of unchanged code |
| Content-Addressable Store | Shared repository | Team-wide artifact sharing |
| Remote Execution Cache | Build farm | Distributed computation + caching |
How Bazel decides if an action can be cached:
βββββββββββββββββββββββββββββββββββββββββββββββ β CACHE KEY COMPUTATION β βββββββββββββββββββββββββββββββββββββββββββββββ€ β β β Hash of: β β ββ Command line and flags β β ββ Input file contents (SHA256) β β ββ Environment variables β β ββ Toolchain version β β ββ Action mnemonic β β β β β β β Cache Key: a7f3bc29e... β β β β β Lookup in cache β β β β β ββββββ΄βββββ β β βΌ βΌ β β Found Not Found β β β β β β Reuse Execute β β Output & Cache β β β βββββββββββββββββββββββββββββββββββββββββββββββ
The cache key is content-based, not timestamp-based. This means:
- β Switching git branches back and forth reuses cached results
- β Different developers with identical code get cache hits
- β Build outputs are reproducible across machines and time
β οΈ Common Pitfall: If your actions read system time, random numbers, or network data, they'll never cache! Keep actions deterministic.
The Loading and Analysis Phases π
Bazel builds happen in distinct phases, and understanding this is crucial to the mental model:
βββββββββββββββββββββββββββββββββββββββββββββββ β BAZEL BUILD PHASES β βββββββββββββββββββββββββββββββββββββββββββββββ€ β β β 1οΈβ£ LOADING β β Read BUILD files, parse syntax β β Build package graph β β Time: O(packages) β β β β β 2οΈβ£ ANALYSIS β β Construct action graph β β Resolve dependencies β β Configure toolchains β β Time: O(targets in dependency graph) β β β β β 3οΈβ£ EXECUTION β β Run actions (potentially parallel) β β Check caches β β Write outputs β β Time: O(changed actions) β β β β β β BUILD COMPLETE β β β βββββββββββββββββββββββββββββββββββββββββββββββ
Why this matters:
- Loading is fast and incrementalβBazel only reloads changed BUILD files
- Analysis constructs the complete action graph before any execution
- Execution can be massively parallel because the graph is known upfront
π§ Memory Aid: Think "L.A.E." - Load, Analyze, Execute. Like planning a trip: Load the map, Analyze the route, Execute the drive.
Real-World Examples: Seeing the Mental Model in Action π
Example 1: A Simple Java Library Build
Let's trace through how Bazel thinks about building a Java library:
## //src/main/java/com/example/BUILD
java_library(
name = "utils",
srcs = ["Utils.java", "Helper.java"],
deps = ["//third_party/guava"],
visibility = ["//visibility:public"],
)
Mental model walkthrough:
| Phase | What Bazel Does | Key Insight |
|---|---|---|
| Loading | Parse BUILD file, register utils target with srcs and deps | No compilation yetβjust bookkeeping |
| Analysis | Resolve //third_party/guava dependency, create Javac action with inputs [Utils.java, Helper.java, guava.jar] | Action graph is now complete |
| Execution | Check cache for hash of [command, Utils.java content, Helper.java content, guava.jar content]. If miss, run javac in sandbox | Output: libutils.jar |
What makes this hermetic:
- β Only declared sources (Utils.java, Helper.java) are accessible
- β Only declared dependency (guava) is on classpath
- β Javac version comes from declared toolchain
- β No access to system CLASSPATH or installed JDK beyond toolchain
Example 2: Incremental Build After Change
Now let's modify Helper.java:
βββββββββββββββββββββββββββββββββββββββββββββββ β INCREMENTAL BUILD SCENARIO β βββββββββββββββββββββββββββββββββββββββββββββββ€ β β β Initial state: β β Utils.java [hash: abc123] β β Helper.java [hash: def456] β CHANGED β β guava.jar [hash: ghi789] β β β β Change: Edit Helper.java β β β New hash: xyz999 β β β β Bazel's reasoning: β β 1. Helper.java changed β invalidate β β utils target β β 2. Recompile utils (cache miss) β β 3. Targets depending on utils β β must rebuild β β 4. Unrelated targets SKIP execution β β (cache hit!) β β β βββββββββββββββββββββββββββββββββββββββββββββββ
The power of the dependency graph: Bazel knows exactly which targets are affected by the change. If you have 1000 targets and only 3 transitively depend on utils, only those 3 rebuild.
Example 3: Cross-Language Dependencies
Bazel's mental model shines with multi-language builds:
## //api/BUILD
proto_library(
name = "api_proto",
srcs = ["service.proto"],
)
java_proto_library(
name = "api_java_proto",
deps = [":api_proto"],
)
py_proto_library(
name = "api_py_proto",
deps = [":api_proto"],
)
## //server/BUILD
java_binary(
name = "server",
srcs = ["Server.java"],
deps = ["//api:api_java_proto"],
)
## //client/BUILD
py_binary(
name = "client",
srcs = ["client.py"],
deps = ["//api:api_py_proto"],
)
Dependency graph visualization:
service.proto
(:api_proto)
β
βββββββββ΄ββββββββ
βΌ βΌ
java_proto_library py_proto_library
(:api_java_proto) (:api_py_proto)
β β
βΌ βΌ
Server.java client.py
(:server) (:client)
Mental model benefits:
- If
service.protochanges, both Java and Python code regenerate - Each language uses its own toolchain (javac, protoc, python interpreter)
- Cache keys are independentβJava rebuild doesn't invalidate Python cache
- Everything still hermeticβeach action only sees its declared inputs
Example 4: Remote Execution Mental Model
When using remote execution, Bazel's mental model extends to a distributed system:
βββββββββββββββββββββββββββββββββββββββββββββββ β REMOTE EXECUTION FLOW β βββββββββββββββββββββββββββββββββββββββββββββββ€ β β β π LOCAL MACHINE β β ββ Load & Analyze (always local) β β ββ Compute action cache keys β β ββ Check remote cache β β β β β ββββ Cache Hit? ββββββ β β β βΌ β β β Download output β β β β β β β β Done β β β β β ββββ Cache Miss β β β β β βΌ β β βοΈ REMOTE WORKER POOL β β ββ Upload inputs (CAS) β β ββ Schedule action on worker β β ββ Worker executes in sandbox β β ββ Upload outputs (CAS) β β ββ Cache result β β β β β βΌ β β π LOCAL MACHINE β β ββ Download outputs β β β β β β Done β β β βββββββββββββββββββββββββββββββββββββββββββββββ
Key insight: From Bazel's perspective, remote execution is just another way to run actions. The mental model stays the sameβinputs, command, outputs. The hermeticity guarantees make this possible.
Common Mistakes: Mental Model Misalignments β οΈ
Mistake 1: Thinking Imperatively, Not Declaratively
β Wrong mental model: "Bazel runs my build script"
## WRONG: Trying to write imperative code
genrule(
name = "bad_example",
outs = ["output.txt"],
cmd = """\n if [ -f /tmp/cache.txt ]; then\n cp /tmp/cache.txt $@\n else\n echo 'data' > $@\n fi\n """,
)
Problems:
- Reads from
/tmp(not hermetic!) - Non-deterministic (depends on external state)
- Won't work in sandbox
- Can't cache reliably
β Correct mental model: "I declare what outputs come from what inputs"
## RIGHT: Declarative dependencies
genrule(
name = "good_example",
srcs = ["input.txt"], # Declare all inputs!
outs = ["output.txt"],
cmd = "process $(location :input.txt) > $@",
)
Mistake 2: Assuming Timestamps Matter
β Wrong: "If I touch a file, Bazel will rebuild"
Bazel doesn't care about modification times (mtime). Only content matters.
## This WON'T trigger rebuild if content unchanged
touch src/Main.java
bazel build //src:app # Cache hit!
β Correct: "If file content changes, Bazel rebuilds"
Mistake 3: Hidden Dependencies
β Wrong mental model: "Bazel will find my includes"
## WRONG: Undeclared dependency
cc_library(
name = "sneaky",
srcs = ["code.cc"],
# code.cc does: #include "other/header.h"
# But other/header.h not in deps!
)
Symptoms:
- Works locally (header found in system paths)
- Fails in CI or for other developers
- Sandbox violations
- Non-reproducible builds
β Correct mental model: "Every dependency must be explicit"
## RIGHT: Declare all dependencies
cc_library(
name = "correct",
srcs = ["code.cc"],
deps = ["//other:header_lib"], # Explicit!
hdrs = ["public_header.h"],
)
Mistake 4: Confusing Packages and Targets
β Wrong: Using filesystem paths instead of labels
## WRONG: Relative filesystem path
deps = ["../other/lib.jar"] # This is not how Bazel works!
β Correct: Using proper Bazel labels
## RIGHT: Bazel label
deps = ["//other:lib"] # Target label
Mental model fix: Think in terms of the dependency graph, not the file tree. Labels are nodes in the graph.
Mistake 5: Non-Deterministic Actions
β Wrong: Actions that produce different outputs from same inputs
genrule(
name = "random_bad",
outs = ["id.txt"],
cmd = "echo $$RANDOM > $@", # Different every time!
)
Why this breaks everything:
- Cache becomes useless (always misses)
- Different machines get different results
- No reproducibility
- Remote execution fails
β Correct: Deterministic actions
genrule(
name = "version_good",
srcs = ["version.txt"], # Input determines output
outs = ["stamped_version.txt"],
cmd = "cat $(location :version.txt) > $@",
)
Mistake 6: Modifying Source Tree During Build
β Wrong mental model: "Builds can write to source directories"
Bazel assumes your source tree is read-only during builds. Actions that try to modify sources violate the mental model.
β
Correct: All build outputs go to bazel-bin/, bazel-out/, never back to source tree.
Key Takeaways π―
π§ Core Mental Model Principles
| 1. Declarative, not imperative | Describe what, not how |
| 2. Everything is a graph | Targets, actions, dependenciesβall nodes and edges |
| 3. Content-addressed, not time-based | SHA256 hashes, not timestamps |
| 4. Hermetic by default | No hidden inputs, no system state |
| 5. Parallelizable | Independent actions run concurrently |
| 6. Cacheable everywhere | Local, shared, remoteβsame mental model |
The Bazel Mindset Shift π
From traditional builds:
- "Run this script in sequence"
- "Check if files are newer"
- "Hope the environment is right"
- "Rebuild everything to be safe"
To Bazel's model:
- "Here's the complete dependency graph"
- "Compare content hashes"
- "Enforce identical environments"
- "Only rebuild what changed"
When to Use Each Concept π§
Use dependency graph thinking when:
- Designing your build structure
- Debugging "why did this rebuild?"
- Optimizing build performance
- Setting up remote execution
Use action-level thinking when:
- Writing custom rules
- Debugging sandbox violations
- Optimizing cache hit rates
- Understanding remote execution costs
Use target/label thinking when:
- Organizing your codebase
- Managing visibility
- Refactoring build files
- Setting up monorepo structure
Pro Tips for Mastering the Mental Model π‘
- Run with
--explain: See exactly why Bazel rebuilt something - Use
bazel query: Explore the dependency graph directly - Enable sandbox debugging: Understand hermeticity violations
- Check action logs: See the actual commands Bazel runs
- Visualize with
--output=graph: See your dependency graph as DOT format
π Quick Reference Card: Bazel Mental Model
| Concept | Definition | Key Property |
|---|---|---|
| Target | Any node in build graph | Has unique label |
| Rule | Transformation function | Creates actions |
| Action | Executable command | Has inputs/outputs |
| Label | Target identifier | //package:name format |
| Sandbox | Isolated execution env | Only declared inputs |
| Cache Key | Hash of action inputs | Content-based |
| Dependency Graph | Complete build relationships | Explicit & acyclic |
| Hermetic | No hidden dependencies | Reproducible |
Remember: L.A.E.
- Load BUILD files β Build package graph
- Analyze dependencies β Create action graph
- Execute actions β Produce outputs
The Golden Rule: Same inputs (including command and environment) β Always same outputs
π Further Study
To deepen your understanding of Bazel's mental model:
Bazel Documentation - Concepts and Terminology: https://bazel.build/concepts/build-ref - Official guide to Bazel's core concepts, dependency graphs, and build phases
Bazel Build Encyclopedia: https://bazel.build/reference/be/overview - Complete reference for all built-in rules, their inputs, outputs, and behaviors
Google's Remote Build Execution Blog: https://bazel.build/remote/rbe - Deep dive into how remote execution works and why hermeticity matters at scale
By mastering this mental model, you'll not just use Bazelβyou'll think in Bazel, making you far more effective at creating fast, reliable, reproducible builds. The investment in understanding these core concepts pays dividends every time you debug a build issue or optimize your build performance! π