You are viewing a preview of this lesson. Sign in to start learning
Back to Hermetic Builds

Sandboxed Execution

Explore how Bazel isolates build actions using sandboxing to enforce hermeticity.

Sandboxed Execution in Bazel

Master sandboxed execution with free flashcards and spaced repetition practice to solidify your understanding of Bazel's isolation mechanisms. This lesson covers sandbox fundamentals, filesystem isolation strategies, network restrictions, and debugging techniquesβ€”essential concepts for building truly hermetic and reproducible builds in modern software systems.

Welcome to Sandboxed Execution 🎯

If you've ever wondered how Bazel ensures that your builds are truly hermeticβ€”meaning they produce identical outputs regardless of where or when they runβ€”sandboxed execution is the answer. Think of a sandbox as a protective bubble around each build action, preventing it from accessing files it shouldn't see, modifying things it shouldn't touch, or depending on ambient system state.

In this lesson, we'll explore how Bazel creates these isolated environments, why they're critical for reproducibility, and how to work effectively within their constraints. Whether you're debugging a failing build or optimizing for speed, understanding sandboxing will transform how you approach build engineering.


Core Concepts: Understanding Sandboxing πŸ”’

What is Sandboxed Execution?

Sandboxed execution is Bazel's mechanism for running build actions in isolated environments where they can only access explicitly declared inputs. Instead of letting a build action see your entire filesystem, network, or environment variables, Bazel creates a restricted workspace containing only what that action declares it needs.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚          TRADITIONAL BUILD                  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                             β”‚
β”‚  Build Action                               β”‚
β”‚       β”‚                                     β”‚
β”‚       β”œβ”€β”€β†’ πŸ“ Can access ANY file          β”‚
β”‚       β”œβ”€β”€β†’ 🌐 Can make network calls       β”‚
β”‚       β”œβ”€β”€β†’ πŸ’Ύ Can read cached data         β”‚
β”‚       └──→ βš™οΈ Sees all env variables       β”‚
β”‚                                             β”‚
β”‚  Result: Unreproducible! 😱                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚          SANDBOXED BUILD                    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”‚
β”‚  β”‚  πŸ–οΈ SANDBOX (isolated)        β”‚         β”‚
β”‚  β”‚                               β”‚         β”‚
β”‚  β”‚  Build Action                 β”‚         β”‚
β”‚  β”‚    β”‚                          β”‚         β”‚
β”‚  β”‚    β”œβ”€β”€β†’ πŸ“ Only declared inputsβ”‚        β”‚
β”‚  β”‚    β”œβ”€β”€β†’ 🚫 No network access   β”‚        β”‚
β”‚  β”‚    β”œβ”€β”€β†’ 🚫 No ambient cache    β”‚        β”‚
β”‚  β”‚    └──→ βš™οΈ Filtered env vars   β”‚        β”‚
β”‚  β”‚                               β”‚         β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β”‚
β”‚                                             β”‚
β”‚  Result: Hermetic & Reproducible! βœ…        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Why Sandboxing Matters 🎯

Reproducibility is the foundation of reliable software builds. If a build succeeds on your machine but fails on a colleague's, or works today but breaks tomorrow, you have a non-hermetic build. Sandboxing solves this by:

  1. Preventing undeclared dependencies: Actions can't accidentally read files that aren't in their declared inputs
  2. Isolating side effects: One action can't pollute the environment for another
  3. Enabling caching: Identical inputs + identical action = guaranteed identical output
  4. Detecting hidden assumptions: Forces you to explicitly declare what your build needs

πŸ’‘ Real-world analogy: Imagine a restaurant kitchen where each chef can only use ingredients explicitly listed in their recipe card. They can't grab random items from the pantry or use a sauce another chef prepared. This ensures the dish tastes the same every time, regardless of who makes it or what else is happening in the kitchen.

How Bazel Implements Sandboxing πŸ› οΈ

Bazel uses different sandbox strategies depending on your operating system:

Operating System Primary Strategy Mechanism
Linux linux-sandbox User namespaces, chroot, or bind mounts
macOS darwin-sandbox sandbox-exec (Apple's Seatbelt policy)
Windows windows-sandbox Symlink forest + process job objects
Any processwrapper-sandbox Filesystem copying (fallback, slower)

The linux-sandbox is the most robust and commonly used strategy in production environments. Here's how it works:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  LINUX SANDBOX IMPLEMENTATION                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

    Host Filesystem              Sandbox View
    ───────────────              ────────────
         
    /usr/bin/gcc ──────────────→ /usr/bin/gcc
    /lib/libc.so ──────────────→ /lib/libc.so
         
    /home/user/project/         /execroot/workspace/
      β”œβ”€β”€ src/                   β”œβ”€β”€ src/
      β”‚   β”œβ”€β”€ main.cc     ─────→ β”‚   β”œβ”€β”€ main.cc
      β”‚   └── lib.h      ─────→ β”‚   └── lib.h
      β”œβ”€β”€ BUILD                  └── (outputs writable)
      └── bazel-out/
           
    /tmp/bazel-cache/     ──────X (not visible!)
    /home/user/.config/   ──────X (not visible!)

Key mechanisms:

  • Mount namespaces: Create a private view of the filesystem
  • Bind mounts: Make only declared inputs visible at expected paths
  • Read-only mounts: Prevent modification of source files
  • PID namespaces: Isolate process visibility
  • Network namespaces: Optionally block network access

The Sandbox Directory Structure πŸ“

When Bazel runs a sandboxed action, it creates a temporary directory structure:

/tmp/bazel-sandbox-/
β”œβ”€β”€ execroot/
β”‚   └── /
β”‚       β”œβ”€β”€ external/          (external dependencies)
β”‚       β”œβ”€β”€ bazel-out/         (output directories)
β”‚       └── /    (source files)
β”‚           β”œβ”€β”€ input1.txt     (symlink to actual file)
β”‚           └── input2.txt     (symlink to actual file)
└── sandbox.log                (debugging information)

The action runs with its working directory set to the sandbox's execroot. Only the declared inputs are visible as symlinks or copies, and outputs are written to designated output directories that get copied back after successful execution.

Declaring Inputs and Outputs πŸ“

For sandboxing to work, you must explicitly declare all inputs and outputs in your build rules. Bazel provides several attributes for this:

Input declarations:

cc_library(
    name = "mylib",
    srcs = ["lib.cc"],           # Direct source inputs
    hdrs = ["lib.h"],            # Header files
    deps = [":other_lib"],       # Dependencies (transitive inputs)
    data = ["config.json"],      # Runtime data files
)

Output declarations:

genrule(
    name = "generate",
    srcs = ["template.txt"],
    outs = ["generated.cc"],     # Declared outputs
    cmd = "process $< > $@",
)

Tools (programs used during the build):

genrule(
    name = "process",
    srcs = ["input.txt"],
    outs = ["output.txt"],
    tools = ["//tools:processor"],  # Build tool (gets special PATH treatment)
    cmd = "$(location //tools:processor) $< > $@",
)

πŸ’‘ Pro tip: Use bazel build --sandbox_debug to see exactly what files are visible in the sandbox. This is invaluable for debugging "file not found" errors.


Examples: Sandboxing in Action πŸ”¬

Example 1: Detecting Undeclared Dependencies

Consider this C++ library that accidentally depends on an undeclared header:

## BUILD file
cc_library(
    name = "broken_lib",
    srcs = ["main.cc"],
    hdrs = ["public.h"],
    # Missing: deps on the library that provides "secret.h"
)
// main.cc
#include "public.h"
#include "secret.h"  // Oops! Not in deps or hdrs!

void do_something() {
    use_secret_function();
}

Without sandboxing (using --spawn_strategy=local):

  • The build might succeed if secret.h happens to exist somewhere the compiler searches
  • Build is non-hermetic and will fail on clean systems
  • Remote caching won't work reliably

With sandboxing (default behavior):

$ bazel build //:broken_lib
ERROR: main.cc:2:10: fatal error: secret.h: No such file or directory
 #include "secret.h"
          ^~~~~~~~~~

The sandbox immediately catches the problem! To fix it:

cc_library(
    name = "fixed_lib",
    srcs = ["main.cc"],
    hdrs = ["public.h"],
    deps = ["//other:lib_with_secret"],  # Now properly declared!
)

Example 2: Environment Variable Control

Bazel sanitizes the environment to prevent non-hermetic behavior:

## BUILD file
genrule(
    name = "env_test",
    outs = ["output.txt"],
    cmd = "echo USER=$$USER > $@; echo HOME=$$HOME >> $@",
)

What happens:

$ bazel build //:env_test
$ cat bazel-bin/output.txt
USER=
HOME=

Most environment variables are not visible in the sandbox! Only a whitelist of essential variables passes through (like PATH, TMPDIR). To explicitly pass variables:

genrule(
    name = "env_test_fixed",
    outs = ["output.txt"],
    cmd = "echo CONFIG=$$CONFIG_VAR > $@",
    # Option 1: Use --action_env flag
    #   bazel build --action_env=CONFIG_VAR=value //:env_test_fixed
    
    # Option 2: Read from a file instead
    srcs = ["config.txt"],  # Hermetic! Tracked by Bazel
)

⚠️ Important: Using --action_env makes the variable part of the action key, so changing it invalidates caches. Prefer reading from files when possible.

Example 3: Network Isolation

By default, sandboxes on Linux can block network access:

genrule(
    name = "download_test",
    outs = ["data.txt"],
    cmd = "curl https://example.com/data > $@",  # This will fail!
)

Result:

$ bazel build --sandbox_block_network //:download_test
ERROR: curl: (6) Could not resolve host: example.com

This is intentional! Network access during builds is non-hermetic because:

  • Remote resources can change
  • Network availability varies
  • Build results become non-cacheable

Proper solution: Use Bazel's repository rules to fetch dependencies before the build:

## WORKSPACE file
load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_file")

http_file(
    name = "external_data",
    urls = ["https://example.com/data"],
    sha256 = "abc123...",  # Ensures integrity!
)

## BUILD file
genrule(
    name = "process_data",
    srcs = ["@external_data//file"],  # Dependency on fetched file
    outs = ["processed.txt"],
    cmd = "process $< > $@",
)

Now the network fetch happens during the loading phase, not during action execution, and the SHA-256 hash ensures reproducibility.

Example 4: Debugging Sandbox Issues

When a sandboxed action fails mysteriously, use these techniques:

Technique 1: Inspect the sandbox

$ bazel build --sandbox_debug --verbose_failures //:target

This preserves the sandbox directory after failure and shows its path:

Sandbox directory: /tmp/bazel-sandbox-1234567890abcdef/execroot/my_workspace

You can then explore:

$ ls -la /tmp/bazel-sandbox-1234567890abcdef/execroot/my_workspace
$ cat /tmp/bazel-sandbox-1234567890abcdef/sandbox.log

Technique 2: Run without sandboxing temporarily

$ bazel build --spawn_strategy=local //:target

If this succeeds, you have an undeclared dependency. Compare what's available:

  • With --spawn_strategy=local: full filesystem access
  • With sandboxing (default): only declared inputs

Technique 3: Use execution log

$ bazel build --execution_log_json_file=exec.log //:target
$ cat exec.log | jq '.[] | select(.type == "action") | .inputs'

This shows exactly what inputs were provided to each action.


Common Mistakes and How to Avoid Them ⚠️

Mistake 1: Absolute Path Dependencies

❌ Wrong:

genrule(
    name = "bad_rule",
    outs = ["output.txt"],
    cmd = "cat /home/user/data.txt > $@",  # Hardcoded absolute path!
)

Problem: This path won't exist in the sandbox (or on other machines).

βœ… Correct:

genrule(
    name = "good_rule",
    srcs = ["data.txt"],  # Declare as input
    outs = ["output.txt"],
    cmd = "cat $(location data.txt) > $@",  # Use location function
)

Mistake 2: Relying on Installed Tools

❌ Wrong:

genrule(
    name = "assumes_python",
    outs = ["result.txt"],
    cmd = "python3 script.py > $@",  # Assumes python3 in PATH
)

Problem: The sandbox has a restricted PATH. Python might not be visible.

βœ… Correct:

genrule(
    name = "explicit_python",
    srcs = ["script.py"],
    outs = ["result.txt"],
    tools = ["@python_interpreter//bin:python3"],  # Explicit tool dependency
    cmd = "$(location @python_interpreter//bin:python3) $(location script.py) > $@",
)

Or use py_binary which handles this automatically:

py_binary(
    name = "script",
    srcs = ["script.py"],
)

genrule(
    name = "run_script",
    outs = ["result.txt"],
    tools = [":script"],
    cmd = "$(location :script) > $@",
)

Mistake 3: Writing to Source Directory

❌ Wrong:

genrule(
    name = "bad_generator",
    srcs = ["template.txt"],
    outs = ["generated.txt"],
    cmd = "generator $(location template.txt); cp generated.txt $@",
    # generator writes to current directory, then we copy to output
)

Problem: The sandbox's source directory is read-only! Writing fails.

βœ… Correct:

genrule(
    name = "good_generator",
    srcs = ["template.txt"],
    outs = ["generated.txt"],
    cmd = "generator $(location template.txt) $@",
    # Write directly to the output location ($@)
)

Or use a temporary directory:

genrule(
    name = "with_temp",
    srcs = ["template.txt"],
    outs = ["generated.txt"],
    cmd = """\n        TEMP=$$(mktemp -d);
        cd $$TEMP;
        generator $$(pwd)/../$(location template.txt);
        cp generated.txt $(location generated.txt)
    """,
)

Mistake 4: Assuming Specific Sandbox Strategy

❌ Wrong:

## Assuming linux-sandbox specifics in a genrule command
genrule(
    name = "linux_only",
    outs = ["output.txt"],
    cmd = "ls /proc/self/ns > $@",  # Linux-specific!
)

Problem: Won't work on macOS or Windows.

βœ… Correct: Write portable commands or use select() for platform-specific behavior:

genrule(
    name = "portable",
    outs = ["output.txt"],
    cmd = select({
        "@platforms//os:linux": "uname -s > $@",
        "@platforms//os:macos": "uname -s > $@",
        "@platforms//os:windows": "echo Windows > $@",
    }),
)

Mistake 5: Forgetting Runtime Data

❌ Wrong:

py_test(
    name = "config_test",
    srcs = ["test.py"],
    # test.py tries to open "testdata/input.json"
)

Problem: Runtime data files aren't in srcs, so they're not visible during test execution.

βœ… Correct:

py_test(
    name = "config_test",
    srcs = ["test.py"],
    data = ["testdata/input.json"],  # Include runtime data!
)

Key Takeaways πŸŽ“

πŸ“‹ Quick Reference Card: Sandboxed Execution

Core Purpose Isolate build actions to ensure reproducibility
What Gets Isolated Filesystem access, environment variables, network, process tree
Linux Strategy Namespaces + bind mounts (most robust)
macOS Strategy sandbox-exec (Seatbelt policies)
Windows Strategy Symlink forest + job objects
Debug Flag --sandbox_debug
Disable Sandbox --spawn_strategy=local (not for production!)
Block Network --sandbox_block_network
Required Declarations srcs, deps, data, tools, outs
Use $(location) Reference input/tool paths portably

Remember these principles:

  1. πŸ”’ Explicit is better than implicit: Declare all dependencies
  2. 🌐 No network during build: Fetch in WORKSPACE, not in actions
  3. πŸ“ Outputs go to declared locations: Don't write to random paths
  4. πŸ› οΈ Tools must be declared: Never assume installed programs
  5. πŸ› Debug with --sandbox_debug: Inspect what the action actually sees
  6. βœ… Reproducibility is the goal: Same inputs β†’ same outputs, always

πŸ’‘ Mental model: Think of each build action as a pure function in functional programming. It takes explicit inputs (files, tools, environment) and produces explicit outputs. No side effects, no hidden state, no ambient dependencies.

πŸ€” Did you know? Google's internal build system (Blaze, which Bazel is based on) runs millions of sandboxed actions per day across thousands of machines. Sandboxing is what makes this massive scale possibleβ€”it ensures that a build tested on one machine will work identically on any other machine in the fleet.


πŸ“š Further Study

  1. Bazel Sandboxing Documentation: https://bazel.build/docs/sandboxing - Official guide with platform-specific details
  2. Hermetic Builds Best Practices: https://bazel.build/basics/hermeticity - Comprehensive guide to achieving hermetic builds
  3. Linux Namespaces Deep Dive: https://www.man7.org/linux/man-pages/man7/namespaces.7.html - Technical details on the isolation mechanisms Bazel uses

Congratulations! πŸŽ‰ You now understand how Bazel's sandboxed execution creates isolated, reproducible build environments. Next, explore how this enables powerful features like remote execution and distributed caching across your entire team.