Sandboxed Execution

Explore how Bazel isolates build actions using sandboxing to enforce hermeticity.

Sandboxed Execution in Bazel

Master sandboxed execution with free flashcards and spaced repetition practice to solidify your understanding of Bazel's isolation mechanisms. This lesson covers sandbox fundamentals, filesystem isolation strategies, network restrictions, and debugging techniques—essential concepts for building truly hermetic and reproducible builds in modern software systems.

Welcome to Sandboxed Execution 🎯

If you've ever wondered how Bazel ensures that your builds are truly hermetic—meaning they produce identical outputs regardless of where or when they run—sandboxed execution is the answer. Think of a sandbox as a protective bubble around each build action, preventing it from accessing files it shouldn't see, modifying things it shouldn't touch, or depending on ambient system state.

In this lesson, we'll explore how Bazel creates these isolated environments, why they're critical for reproducibility, and how to work effectively within their constraints. Whether you're debugging a failing build or optimizing for speed, understanding sandboxing will transform how you approach build engineering.

Core Concepts: Understanding Sandboxing 🔒

What is Sandboxed Execution?

Sandboxed execution is Bazel's mechanism for running build actions in isolated environments where they can only access explicitly declared inputs. Instead of letting a build action see your entire filesystem, network, or environment variables, Bazel creates a restricted workspace containing only what that action declares it needs.

┌─────────────────────────────────────────────┐
│          TRADITIONAL BUILD                  │
├─────────────────────────────────────────────┤
│                                             │
│  Build Action                               │
│       │                                     │
│       ├──→ 📁 Can access ANY file          │
│       ├──→ 🌐 Can make network calls       │
│       ├──→ 💾 Can read cached data         │
│       └──→ ⚙️ Sees all env variables       │
│                                             │
│  Result: Unreproducible! 😱                 │
└─────────────────────────────────────────────┘

┌─────────────────────────────────────────────┐
│          SANDBOXED BUILD                    │
├─────────────────────────────────────────────┤
│                                             │
│  ┌───────────────────────────────┐         │
│  │  🏖️ SANDBOX (isolated)        │         │
│  │                               │         │
│  │  Build Action                 │         │
│  │    │                          │         │
│  │    ├──→ 📁 Only declared inputs│        │
│  │    ├──→ 🚫 No network access   │        │
│  │    ├──→ 🚫 No ambient cache    │        │
│  │    └──→ ⚙️ Filtered env vars   │        │
│  │                               │         │
│  └───────────────────────────────┘         │
│                                             │
│  Result: Hermetic & Reproducible! ✅        │
└─────────────────────────────────────────────┘

Why Sandboxing Matters 🎯

Reproducibility is the foundation of reliable software builds. If a build succeeds on your machine but fails on a colleague's, or works today but breaks tomorrow, you have a non-hermetic build. Sandboxing solves this by:

Preventing undeclared dependencies: Actions can't accidentally read files that aren't in their declared inputs
Isolating side effects: One action can't pollute the environment for another
Enabling caching: Identical inputs + identical action = guaranteed identical output
Detecting hidden assumptions: Forces you to explicitly declare what your build needs

💡 Real-world analogy: Imagine a restaurant kitchen where each chef can only use ingredients explicitly listed in their recipe card. They can't grab random items from the pantry or use a sauce another chef prepared. This ensures the dish tastes the same every time, regardless of who makes it or what else is happening in the kitchen.

How Bazel Implements Sandboxing 🛠️

Bazel uses different sandbox strategies depending on your operating system:

Operating System	Primary Strategy	Mechanism
Linux	linux-sandbox	User namespaces, chroot, or bind mounts
macOS	darwin-sandbox	sandbox-exec (Apple's Seatbelt policy)
Windows	windows-sandbox	Symlink forest + process job objects
Any	processwrapper-sandbox	Filesystem copying (fallback, slower)

The linux-sandbox is the most robust and commonly used strategy in production environments. Here's how it works:

┌────────────────────────────────────────────────┐
│  LINUX SANDBOX IMPLEMENTATION                  │
└────────────────────────────────────────────────┘

    Host Filesystem              Sandbox View
    ───────────────              ────────────
         
    /usr/bin/gcc ──────────────→ /usr/bin/gcc
    /lib/libc.so ──────────────→ /lib/libc.so
         
    /home/user/project/         /execroot/workspace/
      ├── src/                   ├── src/
      │   ├── main.cc     ─────→ │   ├── main.cc
      │   └── lib.h      ─────→ │   └── lib.h
      ├── BUILD                  └── (outputs writable)
      └── bazel-out/
           
    /tmp/bazel-cache/     ──────X (not visible!)
    /home/user/.config/   ──────X (not visible!)

Key mechanisms:

Mount namespaces: Create a private view of the filesystem
Bind mounts: Make only declared inputs visible at expected paths
Read-only mounts: Prevent modification of source files
PID namespaces: Isolate process visibility
Network namespaces: Optionally block network access

The Sandbox Directory Structure 📁

When Bazel runs a sandboxed action, it creates a temporary directory structure:

/tmp/bazel-sandbox-/
├── execroot/
│   └── /
│       ├── external/          (external dependencies)
│       ├── bazel-out/         (output directories)
│       └── /    (source files)
│           ├── input1.txt     (symlink to actual file)
│           └── input2.txt     (symlink to actual file)
└── sandbox.log                (debugging information)

The action runs with its working directory set to the sandbox's execroot. Only the declared inputs are visible as symlinks or copies, and outputs are written to designated output directories that get copied back after successful execution.

Declaring Inputs and Outputs 📝

For sandboxing to work, you must explicitly declare all inputs and outputs in your build rules. Bazel provides several attributes for this:

Input declarations:

cc_library(
    name = "mylib",
    srcs = ["lib.cc"],           # Direct source inputs
    hdrs = ["lib.h"],            # Header files
    deps = [":other_lib"],       # Dependencies (transitive inputs)
    data = ["config.json"],      # Runtime data files
)

Output declarations:

genrule(
    name = "generate",
    srcs = ["template.txt"],
    outs = ["generated.cc"],     # Declared outputs
    cmd = "process $< > $@",
)

Tools (programs used during the build):

genrule(
    name = "process",
    srcs = ["input.txt"],
    outs = ["output.txt"],
    tools = ["//tools:processor"],  # Build tool (gets special PATH treatment)
    cmd = "$(location //tools:processor) $< > $@",
)

💡 Pro tip: Use bazel build --sandbox_debug to see exactly what files are visible in the sandbox. This is invaluable for debugging "file not found" errors.

Examples: Sandboxing in Action 🔬

Example 1: Detecting Undeclared Dependencies

Consider this C++ library that accidentally depends on an undeclared header:

## BUILD file
cc_library(
    name = "broken_lib",
    srcs = ["main.cc"],
    hdrs = ["public.h"],
    # Missing: deps on the library that provides "secret.h"
)

// main.cc
#include "public.h"
#include "secret.h"  // Oops! Not in deps or hdrs!

void do_something() {
    use_secret_function();
}

Without sandboxing (using --spawn_strategy=local):

The build might succeed if secret.h happens to exist somewhere the compiler searches
Build is non-hermetic and will fail on clean systems
Remote caching won't work reliably

With sandboxing (default behavior):

$ bazel build //:broken_lib
ERROR: main.cc:2:10: fatal error: secret.h: No such file or directory
 #include "secret.h"
          ^~~~~~~~~~

The sandbox immediately catches the problem! To fix it:

cc_library(
    name = "fixed_lib",
    srcs = ["main.cc"],
    hdrs = ["public.h"],
    deps = ["//other:lib_with_secret"],  # Now properly declared!
)

Example 2: Environment Variable Control

Bazel sanitizes the environment to prevent non-hermetic behavior:

## BUILD file
genrule(
    name = "env_test",
    outs = ["output.txt"],
    cmd = "echo USER=$$USER > $@; echo HOME=$$HOME >> $@",
)

What happens:

$ bazel build //:env_test
$ cat bazel-bin/output.txt
USER=
HOME=

Most environment variables are not visible in the sandbox! Only a whitelist of essential variables passes through (like PATH, TMPDIR). To explicitly pass variables:

genrule(
    name = "env_test_fixed",
    outs = ["output.txt"],
    cmd = "echo CONFIG=$$CONFIG_VAR > $@",
    # Option 1: Use --action_env flag
    #   bazel build --action_env=CONFIG_VAR=value //:env_test_fixed
    
    # Option 2: Read from a file instead
    srcs = ["config.txt"],  # Hermetic! Tracked by Bazel
)

⚠️ Important: Using --action_env makes the variable part of the action key, so changing it invalidates caches. Prefer reading from files when possible.

Example 3: Network Isolation

By default, sandboxes on Linux can block network access:

genrule(
    name = "download_test",
    outs = ["data.txt"],
    cmd = "curl https://example.com/data > $@",  # This will fail!
)

Result:

$ bazel build --sandbox_block_network //:download_test
ERROR: curl: (6) Could not resolve host: example.com

This is intentional! Network access during builds is non-hermetic because:

Remote resources can change
Network availability varies
Build results become non-cacheable

Proper solution: Use Bazel's repository rules to fetch dependencies before the build:

## WORKSPACE file
load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_file")

http_file(
    name = "external_data",
    urls = ["https://example.com/data"],
    sha256 = "abc123...",  # Ensures integrity!
)

## BUILD file
genrule(
    name = "process_data",
    srcs = ["@external_data//file"],  # Dependency on fetched file
    outs = ["processed.txt"],
    cmd = "process $< > $@",
)

Now the network fetch happens during the loading phase, not during action execution, and the SHA-256 hash ensures reproducibility.

Example 4: Debugging Sandbox Issues

When a sandboxed action fails mysteriously, use these techniques:

Technique 1: Inspect the sandbox

$ bazel build --sandbox_debug --verbose_failures //:target

This preserves the sandbox directory after failure and shows its path:

Sandbox directory: /tmp/bazel-sandbox-1234567890abcdef/execroot/my_workspace

You can then explore:

$ ls -la /tmp/bazel-sandbox-1234567890abcdef/execroot/my_workspace
$ cat /tmp/bazel-sandbox-1234567890abcdef/sandbox.log

Technique 2: Run without sandboxing temporarily

$ bazel build --spawn_strategy=local //:target

If this succeeds, you have an undeclared dependency. Compare what's available:

With --spawn_strategy=local: full filesystem access
With sandboxing (default): only declared inputs

Technique 3: Use execution log

$ bazel build --execution_log_json_file=exec.log //:target
$ cat exec.log | jq '.[] | select(.type == "action") | .inputs'

This shows exactly what inputs were provided to each action.

Common Mistakes and How to Avoid Them ⚠️

Mistake 1: Absolute Path Dependencies

❌ Wrong:

genrule(
    name = "bad_rule",
    outs = ["output.txt"],
    cmd = "cat /home/user/data.txt > $@",  # Hardcoded absolute path!
)

Problem: This path won't exist in the sandbox (or on other machines).

✅ Correct:

genrule(
    name = "good_rule",
    srcs = ["data.txt"],  # Declare as input
    outs = ["output.txt"],
    cmd = "cat $(location data.txt) > $@",  # Use location function
)

Mistake 2: Relying on Installed Tools

❌ Wrong:

genrule(
    name = "assumes_python",
    outs = ["result.txt"],
    cmd = "python3 script.py > $@",  # Assumes python3 in PATH
)

Problem: The sandbox has a restricted PATH. Python might not be visible.

✅ Correct:

genrule(
    name = "explicit_python",
    srcs = ["script.py"],
    outs = ["result.txt"],
    tools = ["@python_interpreter//bin:python3"],  # Explicit tool dependency
    cmd = "$(location @python_interpreter//bin:python3) $(location script.py) > $@",
)

Or use py_binary which handles this automatically:

py_binary(
    name = "script",
    srcs = ["script.py"],
)

genrule(
    name = "run_script",
    outs = ["result.txt"],
    tools = [":script"],
    cmd = "$(location :script) > $@",
)

Mistake 3: Writing to Source Directory

❌ Wrong:

genrule(
    name = "bad_generator",
    srcs = ["template.txt"],
    outs = ["generated.txt"],
    cmd = "generator $(location template.txt); cp generated.txt $@",
    # generator writes to current directory, then we copy to output
)

Problem: The sandbox's source directory is read-only! Writing fails.

✅ Correct:

genrule(
    name = "good_generator",
    srcs = ["template.txt"],
    outs = ["generated.txt"],
    cmd = "generator $(location template.txt) $@",
    # Write directly to the output location ($@)
)

Or use a temporary directory:

genrule(
    name = "with_temp",
    srcs = ["template.txt"],
    outs = ["generated.txt"],
    cmd = """\n        TEMP=$$(mktemp -d);
        cd $$TEMP;
        generator $$(pwd)/../$(location template.txt);
        cp generated.txt $(location generated.txt)
    """,
)

Mistake 4: Assuming Specific Sandbox Strategy

❌ Wrong:

## Assuming linux-sandbox specifics in a genrule command
genrule(
    name = "linux_only",
    outs = ["output.txt"],
    cmd = "ls /proc/self/ns > $@",  # Linux-specific!
)

Problem: Won't work on macOS or Windows.

✅ Correct: Write portable commands or use select() for platform-specific behavior:

genrule(
    name = "portable",
    outs = ["output.txt"],
    cmd = select({
        "@platforms//os:linux": "uname -s > $@",
        "@platforms//os:macos": "uname -s > $@",
        "@platforms//os:windows": "echo Windows > $@",
    }),
)

Mistake 5: Forgetting Runtime Data

❌ Wrong:

py_test(
    name = "config_test",
    srcs = ["test.py"],
    # test.py tries to open "testdata/input.json"
)

Problem: Runtime data files aren't in srcs, so they're not visible during test execution.

✅ Correct:

py_test(
    name = "config_test",
    srcs = ["test.py"],
    data = ["testdata/input.json"],  # Include runtime data!
)

Key Takeaways 🎓

📋 Quick Reference Card: Sandboxed Execution

Core Purpose	Isolate build actions to ensure reproducibility
What Gets Isolated	Filesystem access, environment variables, network, process tree
Linux Strategy	Namespaces + bind mounts (most robust)
macOS Strategy	sandbox-exec (Seatbelt policies)
Windows Strategy	Symlink forest + job objects
Debug Flag	`--sandbox_debug`
Disable Sandbox	`--spawn_strategy=local` (not for production!)
Block Network	`--sandbox_block_network`
Required Declarations	`srcs`, `deps`, `data`, `tools`, `outs`
Use `$(location)`	Reference input/tool paths portably

Remember these principles:

🔒 Explicit is better than implicit: Declare all dependencies
🌐 No network during build: Fetch in WORKSPACE, not in actions
📝 Outputs go to declared locations: Don't write to random paths
🛠️ Tools must be declared: Never assume installed programs
🐛 Debug with --sandbox_debug: Inspect what the action actually sees
✅ Reproducibility is the goal: Same inputs → same outputs, always

💡 Mental model: Think of each build action as a pure function in functional programming. It takes explicit inputs (files, tools, environment) and produces explicit outputs. No side effects, no hidden state, no ambient dependencies.

🤔 Did you know? Google's internal build system (Blaze, which Bazel is based on) runs millions of sandboxed actions per day across thousands of machines. Sandboxing is what makes this massive scale possible—it ensures that a build tested on one machine will work identically on any other machine in the fleet.

📚 Further Study

Bazel Sandboxing Documentation: https://bazel.build/docs/sandboxing - Official guide with platform-specific details
Hermetic Builds Best Practices: https://bazel.build/basics/hermeticity - Comprehensive guide to achieving hermetic builds
Linux Namespaces Deep Dive: https://www.man7.org/linux/man-pages/man7/namespaces.7.html - Technical details on the isolation mechanisms Bazel uses

Congratulations! 🎉 You now understand how Bazel's sandboxed execution creates isolated, reproducible build environments. Next, explore how this enables powerful features like remote execution and distributed caching across your entire team.

📝

Ready to practice?

This lesson has 15 questions to help you learn