Bazel Mental Model

Understand Bazel's approach: explicit BUILD files, action-based execution, and remote caching.

Bazel Mental Model

Master Bazel's unique approach to hermetic builds with free flashcards and spaced repetition practice. This lesson covers dependency graphs, build actions, caching mechanisms, and execution environments—essential concepts for understanding how Bazel achieves reproducible builds at scale.

Welcome to the Bazel Mental Model 💻

Bazel isn't just another build tool—it's a complete paradigm shift in how we think about building software. Unlike traditional build systems that execute scripts sequentially, Bazel operates on a declarative model where you describe what you want to build, not how to build it. Understanding this fundamental mental model is the key to unlocking Bazel's power for hermetic, reproducible, and scalable builds.

Think of traditional build tools like Make or Maven as recipe books where you write step-by-step instructions. Bazel, on the other hand, is more like a smart logistics system that figures out the optimal way to get from raw ingredients to finished product, caching intermediate results and parallelizing work automatically.

Core Concepts: The Building Blocks of Bazel's Mind 🧠

The Dependency Graph: Bazel's Map of Reality 🗺️

At the heart of Bazel's mental model is the dependency graph—a complete, explicit representation of every file, rule, and relationship in your build. This isn't just metadata; it's the foundation of everything Bazel does.

┌─────────────────────────────────────────────┐
│         DEPENDENCY GRAPH STRUCTURE          │
├─────────────────────────────────────────────┤
│                                             │
│           //app:binary (final)              │
│                  │                          │
│           ┌──────┴──────┐                   │
│           ▼             ▼                   │
│      //app:lib    //third_party:json        │
│           │             │                   │
│           ▼             ▼                   │
│      //utils:helpers   (external)           │
│           │                                 │
│           ▼                                 │
│      //base:core                            │
│                                             │
└─────────────────────────────────────────────┘

Key properties of Bazel's dependency graph:

Explicit: Every dependency must be declared. No hidden includes or classpath magic.
Acyclic: No circular dependencies allowed—Bazel enforces a strict directed acyclic graph (DAG).
Fine-grained: Dependencies are at the target level, not just file or module level.
Hermetic: The graph captures everything needed, including toolchains and execution requirements.

💡 Mental Model Tip: Think of Bazel's dependency graph like a restaurant's ingredient supply chain. Every dish (target) lists exactly which ingredients (dependencies) it needs, where they come from (repositories), and how they're prepared (rules). If an ingredient changes, only the dishes using it need to be remade.

Targets: The Atoms of Your Build 🎯

A target is the fundamental unit in Bazel's world. Targets come in three flavors:

Target Type	Purpose	Example Label
Files	Source inputs or generated outputs	`//src:main.java`
Rules	Instructions for creating outputs from inputs	`//app:server`
Package Groups	Visibility and access control	`//visibility:public`

Every target has a unique label in the format //package/path:target_name. This label is how Bazel references and tracks everything in your build.

Example labels:

//src/main/java/com/example:app - A Java application target
//third_party/protobuf:protobuf_java - An external dependency
//config:dev.json - A configuration file
:local_target - Relative reference within the same package

Rules: The Transformation Functions ⚙️

A rule is like a pure function in functional programming—it takes inputs (source files, dependencies) and produces outputs (compiled binaries, generated files) in a deterministic way.

┌─────────────────────────────────────────────┐
│              RULE EXECUTION                 │
├─────────────────────────────────────────────┤
│                                             │
│  📥 INPUTS                                  │
│  ├─ Source files (srcs)                     │
│  ├─ Dependencies (deps)                     │
│  ├─ Data files (data)                       │
│  └─ Tool dependencies (tools)               │
│                                             │
│         ↓                                   │
│  ⚙️  RULE (e.g., java_binary)               │
│         ↓                                   │
│                                             │
│  📤 OUTPUTS                                 │
│  ├─ Executable binary                       │
│  ├─ Deploy JAR                              │
│  └─ Metadata files                          │
│                                             │
└─────────────────────────────────────────────┘

Critical rule properties:

Deterministic: Same inputs always produce same outputs
Hermetic: No access to system state, network, or undeclared files
Parallelizable: Rules can execute concurrently if dependencies allow
Cacheable: Outputs can be stored and reused across builds and machines

Actions: The Actual Work Units 🔨

When Bazel executes a rule, it generates one or more actions—the actual commands that run. This separation between rule (what to build) and action (how to build) is crucial to Bazel's mental model.

Action Type	Description	Example
Spawn	Execute a command	`javac Main.java -o Main.class`
FileWrite	Generate a text file	Write manifest or config
TemplateExpand	Substitute variables in template	Version stamps
SymlinkTree	Create directory structure	Runfiles trees

Each action has:

Input files (must be outputs of other actions or source files)
Command and environment (toolchain, flags, environment variables)
Output files (created by the action)
Mnemonic (human-readable label like "Javac" or "CppCompile")

🤔 Did You Know? Bazel can execute hundreds of thousands of actions in a single build. The Android codebase at Google regularly has builds with over 1 million actions!

The Sandbox: Enforcing Hermeticity 🔒

Bazel's sandbox is where the magic of hermetic builds happens. When an action executes, Bazel creates an isolated environment that only contains:

Declared inputs: Only files explicitly listed as inputs or dependencies
Toolchain binaries: The compiler, linker, or other tools needed
Output directory: Where the action can write its results

What's NOT in the sandbox:

❌ Your home directory
❌ System libraries (except explicitly declared)
❌ Network access
❌ Other build outputs not declared as dependencies
❌ Environment variables (except hermetic ones)

┌─────────────────────────────────────────────┐
│            SANDBOX ENVIRONMENT              │
├─────────────────────────────────────────────┤
│                                             │
│  ✅ Allowed:                                │
│     📁 execroot/                            │
│        ├─ declared_input1.java              │
│        ├─ declared_input2.java              │
│        └─ toolchain/                        │
│              └─ javac                       │
│                                             │
│  ❌ Forbidden:                              │
│     🚫 /usr/lib/random_library.so           │
│     🚫 /home/user/.config                   │
│     🚫 Network requests                     │
│     🚫 /tmp/cached_state                    │
│                                             │
└─────────────────────────────────────────────┘

💡 Pro Tip: If your build mysteriously fails with "file not found" errors in Bazel but works locally, you've probably found an undeclared dependency. The sandbox is catching your hermeticity violation!

Caching: The Three-Level System 💾

Bazel's caching system is what makes it blazingly fast. It operates at three levels:

Cache Level	Scope	Key Benefit
Local Action Cache	Your machine	Instant rebuilds of unchanged code
Content-Addressable Store	Shared repository	Team-wide artifact sharing
Remote Execution Cache	Build farm	Distributed computation + caching

How Bazel decides if an action can be cached:

┌─────────────────────────────────────────────┐
│          CACHE KEY COMPUTATION              │
├─────────────────────────────────────────────┤
│                                             │
│  Hash of:                                   │
│   ├─ Command line and flags                │
│   ├─ Input file contents (SHA256)          │
│   ├─ Environment variables                 │
│   ├─ Toolchain version                     │
│   └─ Action mnemonic                       │
│                                             │
│         ↓                                   │
│   Cache Key: a7f3bc29e...                   │
│         ↓                                   │
│   Lookup in cache                           │
│         │                                   │
│    ┌────┴────┐                              │
│    ▼         ▼                              │
│  Found    Not Found                         │
│    │         │                              │
│  Reuse    Execute                           │
│  Output   & Cache                           │
│                                             │
└─────────────────────────────────────────────┘

The cache key is content-based, not timestamp-based. This means:

✅ Switching git branches back and forth reuses cached results
✅ Different developers with identical code get cache hits
✅ Build outputs are reproducible across machines and time

⚠️ Common Pitfall: If your actions read system time, random numbers, or network data, they'll never cache! Keep actions deterministic.

The Loading and Analysis Phases 🔍

Bazel builds happen in distinct phases, and understanding this is crucial to the mental model:

┌─────────────────────────────────────────────┐
│          BAZEL BUILD PHASES                 │
├─────────────────────────────────────────────┤
│                                             │
│  1️⃣ LOADING                                 │
│     Read BUILD files, parse syntax          │
│     Build package graph                     │
│     Time: O(packages)                       │
│            ↓                                │
│  2️⃣ ANALYSIS                                │
│     Construct action graph                  │
│     Resolve dependencies                    │
│     Configure toolchains                    │
│     Time: O(targets in dependency graph)    │
│            ↓                                │
│  3️⃣ EXECUTION                               │
│     Run actions (potentially parallel)      │
│     Check caches                            │
│     Write outputs                           │
│     Time: O(changed actions)                │
│            ↓                                │
│  ✅ BUILD COMPLETE                          │
│                                             │
└─────────────────────────────────────────────┘

Why this matters:

Loading is fast and incremental—Bazel only reloads changed BUILD files
Analysis constructs the complete action graph before any execution
Execution can be massively parallel because the graph is known upfront

🧠 Memory Aid: Think "L.A.E." - Load, Analyze, Execute. Like planning a trip: Load the map, Analyze the route, Execute the drive.

Real-World Examples: Seeing the Mental Model in Action 🌍

Example 1: A Simple Java Library Build

Let's trace through how Bazel thinks about building a Java library:

## //src/main/java/com/example/BUILD
java_library(
    name = "utils",
    srcs = ["Utils.java", "Helper.java"],
    deps = ["//third_party/guava"],
    visibility = ["//visibility:public"],
)

Mental model walkthrough:

Phase	What Bazel Does	Key Insight
Loading	Parse BUILD file, register `utils` target with srcs and deps	No compilation yet—just bookkeeping
Analysis	Resolve `//third_party/guava` dependency, create Javac action with inputs [Utils.java, Helper.java, guava.jar]	Action graph is now complete
Execution	Check cache for hash of [command, Utils.java content, Helper.java content, guava.jar content]. If miss, run javac in sandbox	Output: libutils.jar

What makes this hermetic:

✅ Only declared sources (Utils.java, Helper.java) are accessible
✅ Only declared dependency (guava) is on classpath
✅ Javac version comes from declared toolchain
✅ No access to system CLASSPATH or installed JDK beyond toolchain

Example 2: Incremental Build After Change

Now let's modify Helper.java:

┌─────────────────────────────────────────────┐
│       INCREMENTAL BUILD SCENARIO            │
├─────────────────────────────────────────────┤
│                                             │
│  Initial state:                             │
│    Utils.java    [hash: abc123]             │
│    Helper.java   [hash: def456]  ← CHANGED │
│    guava.jar     [hash: ghi789]             │
│                                             │
│  Change: Edit Helper.java                   │
│    → New hash: xyz999                       │
│                                             │
│  Bazel's reasoning:                         │
│  1. Helper.java changed → invalidate        │
│     utils target                            │
│  2. Recompile utils (cache miss)            │
│  3. Targets depending on utils              │
│     must rebuild                            │
│  4. Unrelated targets SKIP execution        │
│     (cache hit!)                            │
│                                             │
└─────────────────────────────────────────────┘

The power of the dependency graph: Bazel knows exactly which targets are affected by the change. If you have 1000 targets and only 3 transitively depend on utils, only those 3 rebuild.

Example 3: Cross-Language Dependencies

Bazel's mental model shines with multi-language builds:

## //api/BUILD
proto_library(
    name = "api_proto",
    srcs = ["service.proto"],
)

java_proto_library(
    name = "api_java_proto",
    deps = [":api_proto"],
)

py_proto_library(
    name = "api_py_proto",
    deps = [":api_proto"],
)

## //server/BUILD
java_binary(
    name = "server",
    srcs = ["Server.java"],
    deps = ["//api:api_java_proto"],
)

## //client/BUILD
py_binary(
    name = "client",
    srcs = ["client.py"],
    deps = ["//api:api_py_proto"],
)

Dependency graph visualization:

                service.proto
                  (:api_proto)
                      │
              ┌───────┴───────┐
              ▼               ▼
      java_proto_library  py_proto_library
       (:api_java_proto)  (:api_py_proto)
              │               │
              ▼               ▼
         Server.java      client.py
          (:server)       (:client)

Mental model benefits:

If service.proto changes, both Java and Python code regenerate
Each language uses its own toolchain (javac, protoc, python interpreter)
Cache keys are independent—Java rebuild doesn't invalidate Python cache
Everything still hermetic—each action only sees its declared inputs

Example 4: Remote Execution Mental Model

When using remote execution, Bazel's mental model extends to a distributed system:

┌─────────────────────────────────────────────┐
│        REMOTE EXECUTION FLOW                │
├─────────────────────────────────────────────┤
│                                             │
│  📍 LOCAL MACHINE                           │
│     ├─ Load & Analyze (always local)        │
│     ├─ Compute action cache keys            │
│     └─ Check remote cache                   │
│           │                                 │
│           ├─── Cache Hit? ─────┐            │
│           │                    ▼            │
│           │              Download output    │
│           │                    │            │
│           │                  ✅ Done        │
│           │                                 │
│           └─── Cache Miss                   │
│                    │                        │
│                    ▼                        │
│  ☁️  REMOTE WORKER POOL                     │
│     ├─ Upload inputs (CAS)                  │
│     ├─ Schedule action on worker            │
│     ├─ Worker executes in sandbox           │
│     ├─ Upload outputs (CAS)                 │
│     └─ Cache result                         │
│                    │                        │
│                    ▼                        │
│  📍 LOCAL MACHINE                           │
│     └─ Download outputs                     │
│                    │                        │
│                  ✅ Done                     │
│                                             │
└─────────────────────────────────────────────┘

Key insight: From Bazel's perspective, remote execution is just another way to run actions. The mental model stays the same—inputs, command, outputs. The hermeticity guarantees make this possible.

Common Mistakes: Mental Model Misalignments ⚠️

Mistake 1: Thinking Imperatively, Not Declaratively

❌ Wrong mental model: "Bazel runs my build script"

## WRONG: Trying to write imperative code
genrule(
    name = "bad_example",
    outs = ["output.txt"],
    cmd = """\n        if [ -f /tmp/cache.txt ]; then\n            cp /tmp/cache.txt $@\n        else\n            echo 'data' > $@\n        fi\n    """,
)

Problems:

Reads from /tmp (not hermetic!)
Non-deterministic (depends on external state)
Won't work in sandbox
Can't cache reliably

✅ Correct mental model: "I declare what outputs come from what inputs"

## RIGHT: Declarative dependencies
genrule(
    name = "good_example",
    srcs = ["input.txt"],  # Declare all inputs!
    outs = ["output.txt"],
    cmd = "process $(location :input.txt) > $@",
)

Mistake 2: Assuming Timestamps Matter

❌ Wrong: "If I touch a file, Bazel will rebuild"

Bazel doesn't care about modification times (mtime). Only content matters.

## This WON'T trigger rebuild if content unchanged
touch src/Main.java
bazel build //src:app  # Cache hit!

✅ Correct: "If file content changes, Bazel rebuilds"

Mistake 3: Hidden Dependencies

❌ Wrong mental model: "Bazel will find my includes"

## WRONG: Undeclared dependency
cc_library(
    name = "sneaky",
    srcs = ["code.cc"],
    # code.cc does: #include "other/header.h"
    # But other/header.h not in deps!
)

Symptoms:

Works locally (header found in system paths)
Fails in CI or for other developers
Sandbox violations
Non-reproducible builds

✅ Correct mental model: "Every dependency must be explicit"

## RIGHT: Declare all dependencies
cc_library(
    name = "correct",
    srcs = ["code.cc"],
    deps = ["//other:header_lib"],  # Explicit!
    hdrs = ["public_header.h"],
)

Mistake 4: Confusing Packages and Targets

❌ Wrong: Using filesystem paths instead of labels

## WRONG: Relative filesystem path
deps = ["../other/lib.jar"]  # This is not how Bazel works!

✅ Correct: Using proper Bazel labels

## RIGHT: Bazel label
deps = ["//other:lib"]  # Target label

Mental model fix: Think in terms of the dependency graph, not the file tree. Labels are nodes in the graph.

Mistake 5: Non-Deterministic Actions

❌ Wrong: Actions that produce different outputs from same inputs

genrule(
    name = "random_bad",
    outs = ["id.txt"],
    cmd = "echo $$RANDOM > $@",  # Different every time!
)

Why this breaks everything:

Cache becomes useless (always misses)
Different machines get different results
No reproducibility
Remote execution fails

✅ Correct: Deterministic actions

genrule(
    name = "version_good",
    srcs = ["version.txt"],  # Input determines output
    outs = ["stamped_version.txt"],
    cmd = "cat $(location :version.txt) > $@",
)

Mistake 6: Modifying Source Tree During Build

❌ Wrong mental model: "Builds can write to source directories"

Bazel assumes your source tree is read-only during builds. Actions that try to modify sources violate the mental model.

✅ Correct: All build outputs go to bazel-bin/, bazel-out/, never back to source tree.

Key Takeaways 🎯

🧠 Core Mental Model Principles

1. Declarative, not imperative	Describe what, not how
2. Everything is a graph	Targets, actions, dependencies—all nodes and edges
3. Content-addressed, not time-based	SHA256 hashes, not timestamps
4. Hermetic by default	No hidden inputs, no system state
5. Parallelizable	Independent actions run concurrently
6. Cacheable everywhere	Local, shared, remote—same mental model

The Bazel Mindset Shift 🔄

From traditional builds:

"Run this script in sequence"
"Check if files are newer"
"Hope the environment is right"
"Rebuild everything to be safe"

To Bazel's model:

"Here's the complete dependency graph"
"Compare content hashes"
"Enforce identical environments"
"Only rebuild what changed"

When to Use Each Concept 🔧

Use dependency graph thinking when:

Designing your build structure
Debugging "why did this rebuild?"
Optimizing build performance
Setting up remote execution

Use action-level thinking when:

Writing custom rules
Debugging sandbox violations
Optimizing cache hit rates
Understanding remote execution costs

Use target/label thinking when:

Organizing your codebase
Managing visibility
Refactoring build files
Setting up monorepo structure

Pro Tips for Mastering the Mental Model 💡

Run with --explain: See exactly why Bazel rebuilt something
Use bazel query: Explore the dependency graph directly
Enable sandbox debugging: Understand hermeticity violations
Check action logs: See the actual commands Bazel runs
Visualize with --output=graph: See your dependency graph as DOT format

📋 Quick Reference Card: Bazel Mental Model

Concept	Definition	Key Property
Target	Any node in build graph	Has unique label
Rule	Transformation function	Creates actions
Action	Executable command	Has inputs/outputs
Label	Target identifier	//package:name format
Sandbox	Isolated execution env	Only declared inputs
Cache Key	Hash of action inputs	Content-based
Dependency Graph	Complete build relationships	Explicit & acyclic
Hermetic	No hidden dependencies	Reproducible

Remember: L.A.E.

Load BUILD files → Build package graph
Analyze dependencies → Create action graph
Execute actions → Produce outputs

The Golden Rule: Same inputs (including command and environment) → Always same outputs

📚 Further Study

To deepen your understanding of Bazel's mental model:

Bazel Documentation - Concepts and Terminology: https://bazel.build/concepts/build-ref - Official guide to Bazel's core concepts, dependency graphs, and build phases
Bazel Build Encyclopedia: https://bazel.build/reference/be/overview - Complete reference for all built-in rules, their inputs, outputs, and behaviors
Google's Remote Build Execution Blog: https://bazel.build/remote/rbe - Deep dive into how remote execution works and why hermeticity matters at scale

By mastering this mental model, you'll not just use Bazel—you'll think in Bazel, making you far more effective at creating fast, reliable, reproducible builds. The investment in understanding these core concepts pays dividends every time you debug a build issue or optimize your build performance! 🚀

📝

Ready to practice?

This lesson has 15 questions to help you learn