Sandboxed Execution
Explore how Bazel isolates build actions using sandboxing to enforce hermeticity.
Sandboxed Execution in Bazel
Master sandboxed execution with free flashcards and spaced repetition practice to solidify your understanding of Bazel's isolation mechanisms. This lesson covers sandbox fundamentals, filesystem isolation strategies, network restrictions, and debugging techniquesβessential concepts for building truly hermetic and reproducible builds in modern software systems.
Welcome to Sandboxed Execution π―
If you've ever wondered how Bazel ensures that your builds are truly hermeticβmeaning they produce identical outputs regardless of where or when they runβsandboxed execution is the answer. Think of a sandbox as a protective bubble around each build action, preventing it from accessing files it shouldn't see, modifying things it shouldn't touch, or depending on ambient system state.
In this lesson, we'll explore how Bazel creates these isolated environments, why they're critical for reproducibility, and how to work effectively within their constraints. Whether you're debugging a failing build or optimizing for speed, understanding sandboxing will transform how you approach build engineering.
Core Concepts: Understanding Sandboxing π
What is Sandboxed Execution?
Sandboxed execution is Bazel's mechanism for running build actions in isolated environments where they can only access explicitly declared inputs. Instead of letting a build action see your entire filesystem, network, or environment variables, Bazel creates a restricted workspace containing only what that action declares it needs.
βββββββββββββββββββββββββββββββββββββββββββββββ β TRADITIONAL BUILD β βββββββββββββββββββββββββββββββββββββββββββββββ€ β β β Build Action β β β β β ββββ π Can access ANY file β β ββββ π Can make network calls β β ββββ πΎ Can read cached data β β ββββ βοΈ Sees all env variables β β β β Result: Unreproducible! π± β βββββββββββββββββββββββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββββββββββββββββββ β SANDBOXED BUILD β βββββββββββββββββββββββββββββββββββββββββββββββ€ β β β βββββββββββββββββββββββββββββββββ β β β ποΈ SANDBOX (isolated) β β β β β β β β Build Action β β β β β β β β β ββββ π Only declared inputsβ β β β ββββ π« No network access β β β β ββββ π« No ambient cache β β β β ββββ βοΈ Filtered env vars β β β β β β β βββββββββββββββββββββββββββββββββ β β β β Result: Hermetic & Reproducible! β β βββββββββββββββββββββββββββββββββββββββββββββββ
Why Sandboxing Matters π―
Reproducibility is the foundation of reliable software builds. If a build succeeds on your machine but fails on a colleague's, or works today but breaks tomorrow, you have a non-hermetic build. Sandboxing solves this by:
- Preventing undeclared dependencies: Actions can't accidentally read files that aren't in their declared inputs
- Isolating side effects: One action can't pollute the environment for another
- Enabling caching: Identical inputs + identical action = guaranteed identical output
- Detecting hidden assumptions: Forces you to explicitly declare what your build needs
π‘ Real-world analogy: Imagine a restaurant kitchen where each chef can only use ingredients explicitly listed in their recipe card. They can't grab random items from the pantry or use a sauce another chef prepared. This ensures the dish tastes the same every time, regardless of who makes it or what else is happening in the kitchen.
How Bazel Implements Sandboxing π οΈ
Bazel uses different sandbox strategies depending on your operating system:
| Operating System | Primary Strategy | Mechanism |
|---|---|---|
| Linux | linux-sandbox | User namespaces, chroot, or bind mounts |
| macOS | darwin-sandbox | sandbox-exec (Apple's Seatbelt policy) |
| Windows | windows-sandbox | Symlink forest + process job objects |
| Any | processwrapper-sandbox | Filesystem copying (fallback, slower) |
The linux-sandbox is the most robust and commonly used strategy in production environments. Here's how it works:
ββββββββββββββββββββββββββββββββββββββββββββββββββ
β LINUX SANDBOX IMPLEMENTATION β
ββββββββββββββββββββββββββββββββββββββββββββββββββ
Host Filesystem Sandbox View
βββββββββββββββ ββββββββββββ
/usr/bin/gcc βββββββββββββββ /usr/bin/gcc
/lib/libc.so βββββββββββββββ /lib/libc.so
/home/user/project/ /execroot/workspace/
βββ src/ βββ src/
β βββ main.cc ββββββ β βββ main.cc
β βββ lib.h ββββββ β βββ lib.h
βββ BUILD βββ (outputs writable)
βββ bazel-out/
/tmp/bazel-cache/ ββββββX (not visible!)
/home/user/.config/ ββββββX (not visible!)
Key mechanisms:
- Mount namespaces: Create a private view of the filesystem
- Bind mounts: Make only declared inputs visible at expected paths
- Read-only mounts: Prevent modification of source files
- PID namespaces: Isolate process visibility
- Network namespaces: Optionally block network access
The Sandbox Directory Structure π
When Bazel runs a sandboxed action, it creates a temporary directory structure:
/tmp/bazel-sandbox-/ βββ execroot/ β βββ / β βββ external/ (external dependencies) β βββ bazel-out/ (output directories) β βββ / (source files) β βββ input1.txt (symlink to actual file) β βββ input2.txt (symlink to actual file) βββ sandbox.log (debugging information)
The action runs with its working directory set to the sandbox's execroot. Only the declared inputs are visible as symlinks or copies, and outputs are written to designated output directories that get copied back after successful execution.
Declaring Inputs and Outputs π
For sandboxing to work, you must explicitly declare all inputs and outputs in your build rules. Bazel provides several attributes for this:
Input declarations:
cc_library(
name = "mylib",
srcs = ["lib.cc"], # Direct source inputs
hdrs = ["lib.h"], # Header files
deps = [":other_lib"], # Dependencies (transitive inputs)
data = ["config.json"], # Runtime data files
)
Output declarations:
genrule(
name = "generate",
srcs = ["template.txt"],
outs = ["generated.cc"], # Declared outputs
cmd = "process $< > $@",
)
Tools (programs used during the build):
genrule(
name = "process",
srcs = ["input.txt"],
outs = ["output.txt"],
tools = ["//tools:processor"], # Build tool (gets special PATH treatment)
cmd = "$(location //tools:processor) $< > $@",
)
π‘ Pro tip: Use bazel build --sandbox_debug to see exactly what files are visible in the sandbox. This is invaluable for debugging "file not found" errors.
Examples: Sandboxing in Action π¬
Example 1: Detecting Undeclared Dependencies
Consider this C++ library that accidentally depends on an undeclared header:
## BUILD file
cc_library(
name = "broken_lib",
srcs = ["main.cc"],
hdrs = ["public.h"],
# Missing: deps on the library that provides "secret.h"
)
// main.cc
#include "public.h"
#include "secret.h" // Oops! Not in deps or hdrs!
void do_something() {
use_secret_function();
}
Without sandboxing (using --spawn_strategy=local):
- The build might succeed if
secret.hhappens to exist somewhere the compiler searches - Build is non-hermetic and will fail on clean systems
- Remote caching won't work reliably
With sandboxing (default behavior):
$ bazel build //:broken_lib
ERROR: main.cc:2:10: fatal error: secret.h: No such file or directory
#include "secret.h"
^~~~~~~~~~
The sandbox immediately catches the problem! To fix it:
cc_library(
name = "fixed_lib",
srcs = ["main.cc"],
hdrs = ["public.h"],
deps = ["//other:lib_with_secret"], # Now properly declared!
)
Example 2: Environment Variable Control
Bazel sanitizes the environment to prevent non-hermetic behavior:
## BUILD file
genrule(
name = "env_test",
outs = ["output.txt"],
cmd = "echo USER=$$USER > $@; echo HOME=$$HOME >> $@",
)
What happens:
$ bazel build //:env_test
$ cat bazel-bin/output.txt
USER=
HOME=
Most environment variables are not visible in the sandbox! Only a whitelist of essential variables passes through (like PATH, TMPDIR). To explicitly pass variables:
genrule(
name = "env_test_fixed",
outs = ["output.txt"],
cmd = "echo CONFIG=$$CONFIG_VAR > $@",
# Option 1: Use --action_env flag
# bazel build --action_env=CONFIG_VAR=value //:env_test_fixed
# Option 2: Read from a file instead
srcs = ["config.txt"], # Hermetic! Tracked by Bazel
)
β οΈ Important: Using --action_env makes the variable part of the action key, so changing it invalidates caches. Prefer reading from files when possible.
Example 3: Network Isolation
By default, sandboxes on Linux can block network access:
genrule(
name = "download_test",
outs = ["data.txt"],
cmd = "curl https://example.com/data > $@", # This will fail!
)
Result:
$ bazel build --sandbox_block_network //:download_test
ERROR: curl: (6) Could not resolve host: example.com
This is intentional! Network access during builds is non-hermetic because:
- Remote resources can change
- Network availability varies
- Build results become non-cacheable
Proper solution: Use Bazel's repository rules to fetch dependencies before the build:
## WORKSPACE file
load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_file")
http_file(
name = "external_data",
urls = ["https://example.com/data"],
sha256 = "abc123...", # Ensures integrity!
)
## BUILD file
genrule(
name = "process_data",
srcs = ["@external_data//file"], # Dependency on fetched file
outs = ["processed.txt"],
cmd = "process $< > $@",
)
Now the network fetch happens during the loading phase, not during action execution, and the SHA-256 hash ensures reproducibility.
Example 4: Debugging Sandbox Issues
When a sandboxed action fails mysteriously, use these techniques:
Technique 1: Inspect the sandbox
$ bazel build --sandbox_debug --verbose_failures //:target
This preserves the sandbox directory after failure and shows its path:
Sandbox directory: /tmp/bazel-sandbox-1234567890abcdef/execroot/my_workspace
You can then explore:
$ ls -la /tmp/bazel-sandbox-1234567890abcdef/execroot/my_workspace
$ cat /tmp/bazel-sandbox-1234567890abcdef/sandbox.log
Technique 2: Run without sandboxing temporarily
$ bazel build --spawn_strategy=local //:target
If this succeeds, you have an undeclared dependency. Compare what's available:
- With
--spawn_strategy=local: full filesystem access - With sandboxing (default): only declared inputs
Technique 3: Use execution log
$ bazel build --execution_log_json_file=exec.log //:target
$ cat exec.log | jq '.[] | select(.type == "action") | .inputs'
This shows exactly what inputs were provided to each action.
Common Mistakes and How to Avoid Them β οΈ
Mistake 1: Absolute Path Dependencies
β Wrong:
genrule(
name = "bad_rule",
outs = ["output.txt"],
cmd = "cat /home/user/data.txt > $@", # Hardcoded absolute path!
)
Problem: This path won't exist in the sandbox (or on other machines).
β Correct:
genrule(
name = "good_rule",
srcs = ["data.txt"], # Declare as input
outs = ["output.txt"],
cmd = "cat $(location data.txt) > $@", # Use location function
)
Mistake 2: Relying on Installed Tools
β Wrong:
genrule(
name = "assumes_python",
outs = ["result.txt"],
cmd = "python3 script.py > $@", # Assumes python3 in PATH
)
Problem: The sandbox has a restricted PATH. Python might not be visible.
β Correct:
genrule(
name = "explicit_python",
srcs = ["script.py"],
outs = ["result.txt"],
tools = ["@python_interpreter//bin:python3"], # Explicit tool dependency
cmd = "$(location @python_interpreter//bin:python3) $(location script.py) > $@",
)
Or use py_binary which handles this automatically:
py_binary(
name = "script",
srcs = ["script.py"],
)
genrule(
name = "run_script",
outs = ["result.txt"],
tools = [":script"],
cmd = "$(location :script) > $@",
)
Mistake 3: Writing to Source Directory
β Wrong:
genrule(
name = "bad_generator",
srcs = ["template.txt"],
outs = ["generated.txt"],
cmd = "generator $(location template.txt); cp generated.txt $@",
# generator writes to current directory, then we copy to output
)
Problem: The sandbox's source directory is read-only! Writing fails.
β Correct:
genrule(
name = "good_generator",
srcs = ["template.txt"],
outs = ["generated.txt"],
cmd = "generator $(location template.txt) $@",
# Write directly to the output location ($@)
)
Or use a temporary directory:
genrule(
name = "with_temp",
srcs = ["template.txt"],
outs = ["generated.txt"],
cmd = """\n TEMP=$$(mktemp -d);
cd $$TEMP;
generator $$(pwd)/../$(location template.txt);
cp generated.txt $(location generated.txt)
""",
)
Mistake 4: Assuming Specific Sandbox Strategy
β Wrong:
## Assuming linux-sandbox specifics in a genrule command
genrule(
name = "linux_only",
outs = ["output.txt"],
cmd = "ls /proc/self/ns > $@", # Linux-specific!
)
Problem: Won't work on macOS or Windows.
β
Correct: Write portable commands or use select() for platform-specific behavior:
genrule(
name = "portable",
outs = ["output.txt"],
cmd = select({
"@platforms//os:linux": "uname -s > $@",
"@platforms//os:macos": "uname -s > $@",
"@platforms//os:windows": "echo Windows > $@",
}),
)
Mistake 5: Forgetting Runtime Data
β Wrong:
py_test(
name = "config_test",
srcs = ["test.py"],
# test.py tries to open "testdata/input.json"
)
Problem: Runtime data files aren't in srcs, so they're not visible during test execution.
β Correct:
py_test(
name = "config_test",
srcs = ["test.py"],
data = ["testdata/input.json"], # Include runtime data!
)
Key Takeaways π
π Quick Reference Card: Sandboxed Execution
| Core Purpose | Isolate build actions to ensure reproducibility |
| What Gets Isolated | Filesystem access, environment variables, network, process tree |
| Linux Strategy | Namespaces + bind mounts (most robust) |
| macOS Strategy | sandbox-exec (Seatbelt policies) |
| Windows Strategy | Symlink forest + job objects |
| Debug Flag | --sandbox_debug |
| Disable Sandbox | --spawn_strategy=local (not for production!) |
| Block Network | --sandbox_block_network |
| Required Declarations | srcs, deps, data, tools, outs |
Use $(location) |
Reference input/tool paths portably |
Remember these principles:
- π Explicit is better than implicit: Declare all dependencies
- π No network during build: Fetch in WORKSPACE, not in actions
- π Outputs go to declared locations: Don't write to random paths
- π οΈ Tools must be declared: Never assume installed programs
- π Debug with --sandbox_debug: Inspect what the action actually sees
- β Reproducibility is the goal: Same inputs β same outputs, always
π‘ Mental model: Think of each build action as a pure function in functional programming. It takes explicit inputs (files, tools, environment) and produces explicit outputs. No side effects, no hidden state, no ambient dependencies.
π€ Did you know? Google's internal build system (Blaze, which Bazel is based on) runs millions of sandboxed actions per day across thousands of machines. Sandboxing is what makes this massive scale possibleβit ensures that a build tested on one machine will work identically on any other machine in the fleet.
π Further Study
- Bazel Sandboxing Documentation: https://bazel.build/docs/sandboxing - Official guide with platform-specific details
- Hermetic Builds Best Practices: https://bazel.build/basics/hermeticity - Comprehensive guide to achieving hermetic builds
- Linux Namespaces Deep Dive: https://www.man7.org/linux/man-pages/man7/namespaces.7.html - Technical details on the isolation mechanisms Bazel uses
Congratulations! π You now understand how Bazel's sandboxed execution creates isolated, reproducible build environments. Next, explore how this enables powerful features like remote execution and distributed caching across your entire team.