You are viewing a preview of this lesson. Sign in to start learning
Back to Production Observability: From Signals to Root Cause (2026)

Context Propagation Mastery

Master the hardest part of observability: maintaining causality across distributed system boundaries

Context Propagation Mastery

Master distributed tracing and observability with free flashcards and hands-on exercises. This lesson covers context propagation patterns, trace ID management, and cross-service debuggingβ€”essential concepts for building resilient production systems in 2026.

Welcome

πŸ’» Modern distributed systems involve dozens or hundreds of services working together. When a user request fails or slows down, how do you find which service caused the problem? Context propagation is the mechanism that threads observability data across service boundaries, allowing you to follow a single request's journey through your entire system.

Think of context propagation like a relay race πŸƒβ€β™‚οΈβž‘οΈπŸƒβ€β™€οΈ. Each runner carries a baton (context) and passes it to the next runner (service). Without the baton, you can't track which team is winning. Without context propagation, you can't track which request is failing.

Core Concepts

What is Context?

Context is a collection of key-value pairs that travels with a request as it moves through your system. At minimum, it contains:

FieldPurposeExample
Trace IDUnique identifier for entire request flowa7f3c2d1-9e8b-4f5a
Span IDUnique identifier for single operation3b2c1a5f-7d8e-9f0a
Parent Span IDLinks to calling operation1a2b3c4d-5e6f-7g8h
Sampling DecisionWhether to record full tracetrue/false

πŸ’‘ Pro tip: Context can also carry business metadata like user ID, tenant ID, or feature flags that help filter and analyze traces.

The Propagation Journey

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚           REQUEST FLOW WITH CONTEXT                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

   🌐 User Request
      β”‚
      β”‚ trace-id: abc123
      β”‚ span-id: span-1
      ↓
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚ Gateway  β”‚ ← Creates root span
  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
       β”‚ Propagates: trace-id=abc123
       β”‚             parent-span-id=span-1
       β”‚             span-id=span-2
       ↓
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚  Auth    β”‚ ← Reads context, adds span
  β”‚ Service  β”‚
  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
       β”‚ Propagates: trace-id=abc123
       β”‚             parent-span-id=span-2
       β”‚             span-id=span-3
       ↓
  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚ Business β”‚ ← Continues chain
  β”‚  Logic   β”‚
  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
       β”‚
       β”œβ”€β”€β†’ Database (span-4)
       β”‚
       └──→ Cache (span-5)

  πŸ“Š All spans linked by trace-id: abc123

Propagation Mechanisms

Context doesn't magically appear in downstream services. It must be explicitly propagated using one of these methods:

1. HTTP Headers (Most Common)

W3C Trace Context is the modern standard:

traceparent: 00-abc123-span1-01
tracestate: vendor1=value1,vendor2=value2

Breaking down traceparent:

  • 00 = version
  • abc123 = trace ID (32 hex chars in reality)
  • span1 = parent span ID (16 hex chars)
  • 01 = trace flags (sampling bit)

Legacy formats you'll still encounter:

X-B3-TraceId: abc123
X-B3-SpanId: span1
X-B3-ParentSpanId: span0
X-B3-Sampled: 1
2. Message Queue Metadata

For async systems like Kafka or RabbitMQ:

{
  "headers": {
    "traceparent": "00-abc123-span1-01",
    "user-id": "12345"
  },
  "body": { "actual": "message" }
}
3. gRPC Metadata

For RPC frameworks:

md := metadata.Pairs(
    "traceparent", "00-abc123-span1-01",
    "tracestate", "vendor=value"
)
ctx := metadata.NewOutgoingContext(context.Background(), md)

Context Lifecycle

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         SPAN LIFECYCLE IN A SERVICE                   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

1️⃣ EXTRACT                    Service Boundary
   β”‚                           β”‚
   ↓                           ↓
  πŸ“₯ Read headers         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   trace-id=abc123   ───→ β”‚   Service   β”‚
   parent-span=sp1        β”‚             β”‚
                          β”‚  2️⃣ CREATE  β”‚
                          β”‚     β”‚       β”‚
                          β”‚     ↓       β”‚
                          β”‚  New span   β”‚
                          β”‚  span-id=sp2β”‚
                          β”‚  parent=sp1 β”‚
                          β”‚             β”‚
                          β”‚  3️⃣ USE     β”‚
                          β”‚     β”‚       β”‚
                          β”‚     ↓       β”‚
                          β”‚  Execute    β”‚
                          β”‚  business   β”‚
                          β”‚  logic      β”‚
                          β”‚             β”‚
                          β”‚  4️⃣ INJECT  β”‚
                          β”‚     β”‚       β”‚
                          β””β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”˜
                                ↓
                           πŸ“€ Write headers
                              trace-id=abc123
                              parent-span=sp2
                              span-id=sp3
                                β”‚
                                ↓
                          Downstream call

Sampling Decisions

πŸ”Ί Not every request needs full tracing. Sampling reduces overhead:

StrategyWhen to UseSample Rate
Head-basedDecision at request startFixed % (e.g., 1%)
Tail-basedDecision after completion100% errors, 1% success
Priority-basedCritical user pathsVariable by endpoint
AdaptiveHigh traffic systemsAdjusts by load

⚠️ Critical: Sampling decisions must propagate! If the root span says "sample=0", all children should honor it.

Baggage: Carrying Extra Data

Baggage lets you propagate custom key-value pairs:

baggage: userId=12345,tier=premium,region=us-east

πŸ’‘ Use cases:

  • User context: Track which user triggered errors
  • Feature flags: Propagate A/B test variants
  • Multi-tenancy: Ensure tenant ID flows everywhere
  • Debugging: Add temporary tags for investigation

⚠️ Warning: Baggage increases payload size. Keep it under 8KB total.

Real-World Examples

Example 1: Go Service with OpenTelemetry

package main

import (
    "context"
    "net/http"
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/propagation"
    "go.opentelemetry.io/otel/trace"
)

var tracer = otel.Tracer("my-service")
var propagator = propagation.TraceContext{}

// Incoming request handler
func handleRequest(w http.ResponseWriter, r *http.Request) {
    // 1️⃣ EXTRACT: Pull context from headers
    ctx := propagator.Extract(r.Context(), propagation.HeaderCarrier(r.Header))
    
    // 2️⃣ CREATE: Start new span
    ctx, span := tracer.Start(ctx, "handle-request")
    defer span.End()
    
    // Add attributes for filtering
    span.SetAttributes(
        attribute.String("user.id", r.Header.Get("X-User-ID")),
        attribute.String("endpoint", r.URL.Path),
    )
    
    // 3️⃣ USE: Do business logic
    result, err := processOrder(ctx, orderID)
    if err != nil {
        span.RecordError(err)
        span.SetStatus(codes.Error, "order processing failed")
        http.Error(w, err.Error(), 500)
        return
    }
    
    w.Write([]byte(result))
}

// Outgoing HTTP call
func processOrder(ctx context.Context, orderID string) (string, error) {
    // 2️⃣ CREATE: New span for downstream call
    ctx, span := tracer.Start(ctx, "call-inventory-service")
    defer span.End()
    
    req, _ := http.NewRequestWithContext(ctx, "GET", 
        "http://inventory/check", nil)
    
    // 4️⃣ INJECT: Write context to headers
    propagator.Inject(ctx, propagation.HeaderCarrier(req.Header))
    
    resp, err := http.DefaultClient.Do(req)
    if err != nil {
        return "", err
    }
    defer resp.Body.Close()
    
    // Process response...
    return "success", nil
}

Key points:

  • Extract reads trace context from incoming headers
  • Start creates a child span linked to parent
  • Inject writes context into outgoing headers
  • Every span automatically inherits trace ID

Example 2: Python FastAPI Middleware

from fastapi import FastAPI, Request
from opentelemetry import trace
from opentelemetry.propagate import extract, inject
from opentelemetry.trace.propagation.tracecontext import TraceContextTextMapPropagator
import httpx

app = FastAPI()
tracer = trace.get_tracer(__name__)
propagator = TraceContextTextMapPropagator()

@app.middleware("http")
async def tracing_middleware(request: Request, call_next):
    # 1️⃣ EXTRACT: Get context from request headers
    ctx = propagator.extract(carrier=request.headers)
    
    # 2️⃣ CREATE: Start span with extracted context
    with tracer.start_as_current_span(
        name=f"{request.method} {request.url.path}",
        context=ctx,
        kind=trace.SpanKind.SERVER
    ) as span:
        # Add custom attributes
        span.set_attribute("http.method", request.method)
        span.set_attribute("http.url", str(request.url))
        span.set_attribute("user.id", request.headers.get("x-user-id", "anonymous"))
        
        # 3️⃣ USE: Process request
        response = await call_next(request)
        
        span.set_attribute("http.status_code", response.status_code)
        return response

@app.get("/orders/{order_id}")
async def get_order(order_id: str):
    # Current span is automatically active
    with tracer.start_as_current_span("fetch-from-database") as span:
        span.set_attribute("db.operation", "SELECT")
        # Database call here...
    
    # Make downstream call with propagation
    await call_shipping_service(order_id)
    return {"order_id": order_id, "status": "shipped"}

async def call_shipping_service(order_id: str):
    with tracer.start_as_current_span("call-shipping-service") as span:
        headers = {}
        
        # 4️⃣ INJECT: Add trace context to outgoing headers
        propagator.inject(headers)
        
        async with httpx.AsyncClient() as client:
            response = await client.get(
                f"http://shipping-service/track/{order_id}",
                headers=headers
            )
            span.set_attribute("shipping.status", response.status_code)

Key points:

  • Middleware handles extraction once for all requests
  • Context is thread-local (automatically available)
  • Baggage can be added via span.set_attribute()
  • Injection happens before every downstream call

Example 3: Node.js with Manual Propagation

const { trace, context, propagation } = require('@opentelemetry/api');
const axios = require('axios');

const tracer = trace.getTracer('order-service');

// Express middleware
app.use((req, res, next) => {
  // 1️⃣ EXTRACT: Parse incoming headers
  const extractedContext = propagation.extract(
    context.active(),
    req.headers
  );
  
  // 2️⃣ CREATE: Start root span in extracted context
  const span = tracer.startSpan(
    `${req.method} ${req.path}`,
    {
      kind: trace.SpanKind.SERVER,
      attributes: {
        'http.method': req.method,
        'http.target': req.path,
        'user.id': req.headers['x-user-id'],
      },
    },
    extractedContext
  );
  
  // Make span active for this request
  const ctxWithSpan = trace.setSpan(extractedContext, span);
  
  // Store in request for later access
  req.traceContext = ctxWithSpan;
  
  // Wrap response.end to close span
  const originalEnd = res.end;
  res.end = function(...args) {
    span.setAttributes({
      'http.status_code': res.statusCode,
    });
    span.end();
    originalEnd.apply(res, args);
  };
  
  next();
});

// Route handler
app.post('/orders', async (req, res) => {
  const activeContext = req.traceContext;
  
  // 3️⃣ USE: Create child span
  const span = tracer.startSpan(
    'validate-order',
    { kind: trace.SpanKind.INTERNAL },
    activeContext
  );
  
  const ctxWithSpan = trace.setSpan(activeContext, span);
  
  try {
    // Validation logic...
    span.setAttribute('order.items', req.body.items.length);
    
    // Call downstream service
    await callPaymentService(ctxWithSpan, req.body);
    
    span.setStatus({ code: trace.SpanStatusCode.OK });
    res.json({ success: true });
  } catch (error) {
    span.recordException(error);
    span.setStatus({ 
      code: trace.SpanStatusCode.ERROR,
      message: error.message 
    });
    res.status(500).json({ error: error.message });
  } finally {
    span.end();
  }
});

async function callPaymentService(ctx, orderData) {
  const span = tracer.startSpan(
    'POST /payment/process',
    { kind: trace.SpanKind.CLIENT },
    ctx
  );
  
  const ctxWithSpan = trace.setSpan(ctx, span);
  
  try {
    const headers = {};
    
    // 4️⃣ INJECT: Add trace headers to outgoing request
    propagation.inject(ctxWithSpan, headers);
    
    const response = await axios.post(
      'http://payment-service/process',
      orderData,
      { headers }
    );
    
    span.setAttribute('payment.status', response.data.status);
    span.setStatus({ code: trace.SpanStatusCode.OK });
    
    return response.data;
  } catch (error) {
    span.recordException(error);
    span.setStatus({ code: trace.SpanStatusCode.ERROR });
    throw error;
  } finally {
    span.end();
  }
}

Key points:

  • Manual context management gives fine-grained control
  • Must explicitly pass context through async chains
  • trace.setSpan() makes span active in context
  • Always end spans in finally blocks

Example 4: Cross-Language Propagation (Polyglot System)

Imagine a user checkout flow:

🌐 Browser β†’ 🟦 Node.js Gateway β†’ 🟨 Go Auth β†’ πŸŸ₯ Java Payment β†’ 🟩 Python Shipping

Node.js Gateway:

// Creates root trace
const traceId = generateTraceId(); // e.g., "4bf92f3577b34da6a3ce929d0e0e4736"
const spanId = generateSpanId();   // e.g., "00f067aa0ba902b7"

// Sets W3C standard header
res.setHeader('traceparent', `00-${traceId}-${spanId}-01`);
res.setHeader('baggage', `userId=${userId},checkout=true`);

Go Auth Service:

// Extracts from Node.js
traceparent := r.Header.Get("traceparent")
// Parses: version=00, traceId=4bf9..., parentSpan=00f0..., flags=01

// Creates child span
newSpanId := generateSpanId() // "e2e2e2e2e2e2e2e2"

// Propagates to Java
req.Header.Set("traceparent", fmt.Sprintf("00-%s-%s-01", traceId, newSpanId))
req.Header.Set("baggage", r.Header.Get("baggage")) // Pass through

Java Payment Service:

// Extracts using OpenTelemetry Java SDK
Context extractedContext = propagator.extract(
    Context.current(),
    request,
    HttpTextMapGetter.INSTANCE
);

// Automatically creates span with correct parent
Span span = tracer.spanBuilder("process-payment")
    .setParent(extractedContext)
    .startSpan();

// All have same trace ID: 4bf92f3577b34da6a3ce929d0e0e4736

Python Shipping Service:

## Extracts seamlessly
ctx = propagator.extract(carrier=request.headers)

## Reads baggage that originated in Node.js
user_id = baggage.get_baggage("userId", ctx)  # Same user ID!
is_checkout = baggage.get_baggage("checkout", ctx) == "true"

🌍 Real-world benefit: When a checkout fails, you see one unified trace across all four languages, with timestamps showing exactly where the delay occurred.

Common Mistakes

❌ Mistake 1: Creating New Trace Instead of Continuing

Wrong:

## Ignores incoming context!
with tracer.start_as_current_span("my-operation"):
    # This creates a ROOT span, breaking the chain
    process_request()

Right:

## Extract first!
ctx = propagator.extract(carrier=request.headers)
with tracer.start_as_current_span("my-operation", context=ctx):
    process_request()

πŸ’‘ Symptom: You see disconnected traces in your observability tool instead of one unified view.

❌ Mistake 2: Forgetting to Inject on Outbound Calls

Wrong:

req, _ := http.NewRequest("GET", "http://api/data", nil)
resp, _ := client.Do(req)
// Downstream service has no trace context!

Right:

req, _ := http.NewRequestWithContext(ctx, "GET", "http://api/data", nil)
propagator.Inject(ctx, propagation.HeaderCarrier(req.Header))
resp, _ := client.Do(req)

❌ Mistake 3: Losing Context in Goroutines/Async

Wrong:

go func() {
    // ctx is not passed! Creates orphan span
    span := tracer.Start(context.Background(), "async-work")
    defer span.End()
    doWork()
}()

Right:

go func(ctx context.Context) {
    // Inherits parent trace
    ctx, span := tracer.Start(ctx, "async-work")
    defer span.End()
    doWork(ctx)
}(ctx)

❌ Mistake 4: Overloading Baggage

Wrong:

// 50KB of baggage!
baggage.setAll({
  userId: '12345',
  userProfile: JSON.stringify(massiveObject),
  sessionData: largeString,
  preferences: anotherHugeObject
});
// Every service call carries this overhead

Right:

// Lightweight identifiers only
baggage.setAll({
  userId: '12345',
  tenantId: 'acme-corp',
  requestType: 'checkout'
});
// Services fetch full data from cache/DB using IDs

⚠️ Rule of thumb: Baggage should stay under 8KB total. Use IDs, not full objects.

❌ Mistake 5: Ignoring Sampling Decisions

Wrong:

## Force-samples every span regardless of parent decision
with tracer.start_span(
    "operation",
    sampled=True  # Overrides parent!
) as span:
    process()

Right:

## Inherits sampling decision from parent
with tracer.start_span("operation") as span:
    process()
## If parent wasn't sampled, child won't be either

❌ Mistake 6: Not Handling Missing Context Gracefully

Wrong:

String traceparent = request.getHeader("traceparent");
String[] parts = traceparent.split("-"); // NullPointerException!
String traceId = parts[1];

Right:

String traceparent = request.getHeader("traceparent");
if (traceparent != null && traceparent.split("-").length == 4) {
    // Extract and use
} else {
    // Create new root trace
    tracer.spanBuilder("operation").startSpan();
}

πŸ’‘ Best practice: Always have a fallback. Not all clients send trace headers.

Key Takeaways

βœ… Context propagation threads observability across services by carrying trace IDs, span IDs, and metadata through your entire system.

βœ… The four-step pattern is universal: Extract incoming context β†’ Create child span β†’ Use it for work β†’ Inject into outbound calls.

βœ… W3C Trace Context is the modern standard (traceparent header), but legacy systems may use B3 or custom formats.

βœ… Sampling decisions must propagate to prevent partial traces and unnecessary overhead.

βœ… Baggage carries custom data but must stay lightweight (under 8KB) to avoid performance issues.

βœ… Context must be explicitly passed through async boundaries (goroutines, promises, threads, message queues).

βœ… Gracefully handle missing context by creating root traces when headers aren't present.

βœ… Instrumentation libraries automate most of this (OpenTelemetry, Jaeger clients, DataDog), but understanding the mechanics helps debug propagation breaks.

πŸ“‹ Quick Reference Card

ConceptPurposeImplementation
Trace IDGroups all spans in a request32 hex chars (128-bit)
Span IDIdentifies single operation16 hex chars (64-bit)
Parent Span IDLinks to calling span16 hex chars (64-bit)
ExtractRead context from headers/metadatapropagator.extract(headers)
InjectWrite context to headers/metadatapropagator.inject(headers)
BaggageCarry custom key-value pairsbaggage: key1=val1,key2=val2
SamplingReduce tracing overheadHead-based (1-10%) or tail-based
traceparentW3C standard header format00-traceId-spanId-flags

Common Header Formats:

## W3C Trace Context (preferred)
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
tracestate: vendor1=value1,vendor2=value2
baggage: userId=12345,tier=premium

## B3 (Zipkin legacy)
X-B3-TraceId: 4bf92f3577b34da6a3ce929d0e0e4736
X-B3-SpanId: 00f067aa0ba902b7
X-B3-ParentSpanId: 0000000000000000
X-B3-Sampled: 1

Propagation Checklist:

  • β˜‘οΈ Extract context on incoming requests
  • β˜‘οΈ Create child spans with extracted context as parent
  • β˜‘οΈ Add relevant attributes to spans
  • β˜‘οΈ Inject context before all outbound calls (HTTP, gRPC, queues)
  • β˜‘οΈ Pass context through async boundaries
  • β˜‘οΈ End spans in finally blocks
  • β˜‘οΈ Handle missing context gracefully

πŸ“š Further Study

  1. W3C Trace Context Specification: https://www.w3.org/TR/trace-context/ - Official standard for context propagation headers and formats.

  2. OpenTelemetry Context Propagation Guide: https://opentelemetry.io/docs/instrumentation/go/manual/#propagators-and-context - Deep dive into propagation APIs with multi-language examples.

  3. Distributed Tracing Patterns: https://www.oreilly.com/library/view/distributed-tracing-in/9781492056621/ - O'Reilly book covering advanced propagation patterns and real-world troubleshooting.


🎯 Practice Challenge: Set up a three-service system (any languages) and use a tracing tool like Jaeger or Zipkin to visualize trace propagation. Intentionally break propagation in one service and observe the disconnected trace segments.