You are viewing a preview of this lesson. Sign in to start learning
Back to Production Observability: From Signals to Root Cause (2026)

Collectors and Pipelines

Design telemetry collection architecture with receivers, processors, and exporters

OpenTelemetry Collectors and Pipelines

Master OpenTelemetry collectors and pipelines with free flashcards and spaced repetition practice. This lesson covers collector architecture, pipeline components, and processing patternsβ€”essential concepts for building production-grade observability systems in modern distributed environments.

Welcome

πŸ’» Understanding OpenTelemetry Collectors and their pipeline architecture is fundamental to implementing effective observability. While instrumentation generates telemetry signals (traces, metrics, logs), collectors act as the intelligent middleware that receives, processes, and exports this data to your backend systems. Think of collectors as the postal service of observabilityβ€”they handle routing, transformation, batching, and delivery of your telemetry data.

This lesson demystifies how collectors work internally, why pipelines matter, and how to configure them for real-world production scenarios. Whether you're running a microservices architecture or monitoring a monolithic application, collectors provide the flexibility and scalability you need.

Core Concepts

What is an OpenTelemetry Collector?

πŸ”§ The OpenTelemetry Collector is a vendor-agnostic proxy that receives, processes, and exports telemetry data. It's a standalone binary that runs as a sidecar, daemon, or centralized service in your infrastructure.

Key characteristics:

  • Vendor-neutral: Works with any backend (Prometheus, Jaeger, Datadog, New Relic, etc.)
  • Language-agnostic: Accepts data from applications in any programming language
  • Configurable: Uses YAML configuration for complete pipeline customization
  • Extensible: Supports custom receivers, processors, and exporters via plugins
  • High-performance: Written in Go, handles millions of spans per second

πŸ’‘ Pro tip: Start with the OpenTelemetry Collector Contrib distribution, which includes 100+ components. The core distribution contains only basic components.

Deployment Patterns

Collectors can be deployed in three primary patterns:

Pattern Description Use Case Trade-offs
Agent Mode Deployed on each host/node (DaemonSet in Kubernetes) Local collection from apps on the same host βœ… Low latency
❌ Higher resource usage per node
Gateway Mode Centralized collector receiving from multiple agents Data aggregation, enrichment, sampling βœ… Centralized processing
❌ Single point of failure (mitigate with load balancing)
Sidecar Mode Container deployed alongside application container Per-service isolation, custom processing βœ… Service-specific config
❌ Highest resource overhead

🌍 Real-world analogy: Agent mode is like neighborhood post offices, gateway mode is the regional distribution center, and sidecar mode is a personal assistant handling your mail.

DEPLOYMENT ARCHITECTURE

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                 APPLICATION LAYER               β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”           β”‚
β”‚  β”‚ App A  β”‚  β”‚ App B  β”‚  β”‚ App C  β”‚           β”‚
β”‚  β””β”€β”€β”€β”¬β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”¬β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”¬β”€β”€β”€β”€β”˜           β”‚
β”‚      β”‚ OTLP      β”‚ OTLP      β”‚ OTLP           β”‚
β””β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       ↓           ↓           ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              COLLECTOR AGENT LAYER              β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”           β”‚
β”‚  β”‚Agent 1 β”‚  β”‚Agent 2 β”‚  β”‚Agent 3 β”‚           β”‚
β”‚  β””β”€β”€β”€β”¬β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”¬β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”¬β”€β”€β”€β”€β”˜           β”‚
β””β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚           β”‚           β”‚
       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                   ↓
       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
       β”‚  COLLECTOR GATEWAY    β”‚
       β”‚  (Load Balanced)      β”‚
       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                   ↓
       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
       β”‚   BACKEND SYSTEMS     β”‚
       β”‚ Jaeger | Prometheus   β”‚
       β”‚ Loki   | Datadog      β”‚
       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Pipeline Architecture

πŸ”Ί A pipeline is the core abstraction in collectors. Every collector configuration defines three types of pipelines:

  1. Traces pipeline: Processes distributed trace spans
  2. Metrics pipeline: Handles time-series measurements
  3. Logs pipeline: Manages structured log records

Each pipeline consists of three component types:

PIPELINE FLOW

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ RECEIVERS  β”‚ ──→ β”‚ PROCESSORS β”‚ ──→ β”‚ EXPORTERS  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
     β”‚                    β”‚                  β”‚
     β”‚                    β”‚                  β”‚
   Input              Transform            Output
 (OTLP, Jaeger)   (batch, filter,     (Jaeger, Prom,
  Prometheus)      sample, enrich)     OTLP, files)
1. Receivers

Receivers are the entry points for telemetry data. They listen on specific protocols and ports.

Common receivers:

  • otlp: Native OpenTelemetry protocol (gRPC or HTTP)
  • jaeger: Jaeger Thrift format
  • zipkin: Zipkin JSON v1/v2
  • prometheus: Scrapes Prometheus metrics
  • hostmetrics: Collects system metrics (CPU, memory, disk)
  • filelog: Reads logs from files
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318
  
  prometheus:
    config:
      scrape_configs:
        - job_name: 'my-app'
          scrape_interval: 30s
          static_configs:
            - targets: ['localhost:8080']

πŸ’‘ Important: Receivers are push-based (OTLP, Jaeger) or pull-based (Prometheus scraping). Choose based on your application's export method.

2. Processors

Processors transform, filter, or enrich data as it flows through the pipeline. They run in sequence.

Essential processors:

Processor Purpose Example Use Case
batch Groups telemetry before export Reduce network calls (export every 10s or 8192 spans)
memory_limiter Prevents OOM by applying backpressure Limit collector to 512MB memory usage
resource Adds/modifies resource attributes Add environment=production, cluster=us-west
attributes Manipulates span/metric attributes Remove PII, add derived fields
filter Drops unwanted telemetry Exclude health check spans
probabilistic_sampler Samples percentage of traces Keep only 10% of traces to reduce volume
tail_sampling Smart sampling based on span data Keep all error traces, sample successful ones
processors:
  batch:
    timeout: 10s
    send_batch_size: 1024
  
  memory_limiter:
    check_interval: 1s
    limit_mib: 512
    spike_limit_mib: 128
  
  resource:
    attributes:
      - key: environment
        value: production
        action: upsert
      - key: cluster
        value: us-west-2
        action: insert
  
  filter:
    traces:
      span:
        - 'attributes["http.target"] == "/health"'

⚠️ Critical ordering: Place memory_limiter first, batch last:

processors: [memory_limiter, filter, resource, batch]

This ensures memory protection happens before processing, and batching happens right before export.

3. Exporters

Exporters send processed telemetry to backend systems. Multiple exporters can run in parallel.

Popular exporters:

  • otlp: Send to any OTLP-compatible backend
  • otlphttp: OTLP over HTTP (better for proxies/firewalls)
  • jaeger: Export to Jaeger backend
  • prometheus: Expose metrics endpoint for Prometheus to scrape
  • prometheusremotewrite: Push metrics to Prometheus remote write endpoint
  • logging: Debug exporter (prints to console)
  • file: Write to local files (useful for replay/debugging)
exporters:
  otlp:
    endpoint: jaeger:4317
    tls:
      insecure: false
      cert_file: /certs/client.crt
      key_file: /certs/client.key
  
  prometheus:
    endpoint: "0.0.0.0:8889"
    namespace: "my_app"
  
  logging:
    loglevel: debug
    sampling_initial: 5
    sampling_thereafter: 200

Complete Pipeline Configuration

Here's how receivers, processors, and exporters connect into pipelines:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

processors:
  memory_limiter:
    limit_mib: 512
  batch:
    timeout: 10s
    send_batch_size: 1024
  resource:
    attributes:
      - key: service.namespace
        value: production
        action: insert

exporters:
  otlp:
    endpoint: tempo:4317
    tls:
      insecure: true
  prometheus:
    endpoint: 0.0.0.0:8889

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, resource, batch]
      exporters: [otlp]
    
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [prometheus]

πŸ”§ Key insight: The service section wires everything together. Components are referenced by name from their respective sections.

CONFIGURATION STRUCTURE

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         receivers: {...}                β”‚  Define components
β”‚         processors: {...}               β”‚  (implementation)
β”‚         exporters: {...}                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                    ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  service:                               β”‚  Wire components
β”‚    pipelines:                           β”‚  into pipelines
β”‚      traces:                            β”‚  (configuration)
β”‚        receivers: [otlp, jaeger]        β”‚
β”‚        processors: [batch, resource]    β”‚
β”‚        exporters: [otlp]                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Examples

Example 1: Simple Local Development Setup

Scenario: You're developing a microservice locally and want to send traces to Jaeger running in Docker.

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: localhost:4317
      http:
        endpoint: localhost:4318

processors:
  batch:
    timeout: 1s

exporters:
  logging:
    loglevel: debug
  otlp:
    endpoint: localhost:4317
    tls:
      insecure: true

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [logging, otlp]

Why this works:

  • Accepts OTLP on both gRPC (4317) and HTTP (4318) for flexibility
  • Minimal batching (1s) for quick feedback during development
  • logging exporter prints spans to console for immediate debugging
  • Sends to local Jaeger on 4317
  • No memory limiter needed (low volume)

πŸ’‘ Dev tip: Keep the logging exporter during developmentβ€”it's invaluable for debugging instrumentation issues.

Example 2: Production Gateway with Sampling

Scenario: High-traffic production system generating millions of spans. You need cost-effective sampling while keeping all error traces.

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317

processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 2048
    spike_limit_mib: 512
  
  tail_sampling:
    decision_wait: 10s
    num_traces: 100000
    expected_new_traces_per_sec: 1000
    policies:
      # Keep all error traces
      - name: error-traces
        type: status_code
        status_code:
          status_codes: [ERROR]
      
      # Keep all slow traces (>2s)
      - name: slow-traces
        type: latency
        latency:
          threshold_ms: 2000
      
      # Sample 5% of successful traces
      - name: probabilistic-policy
        type: probabilistic
        probabilistic:
          sampling_percentage: 5
  
  batch:
    timeout: 10s
    send_batch_size: 8192
  
  resource:
    attributes:
      - key: deployment.environment
        value: production
        action: insert

exporters:
  otlp:
    endpoint: tempo-gateway:4317
    tls:
      insecure: false
      cert_file: /etc/collector/certs/client.crt
      key_file: /etc/collector/certs/client.key

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, tail_sampling, resource, batch]
      exporters: [otlp]
  
  telemetry:
    logs:
      level: info
    metrics:
      level: detailed
      address: 0.0.0.0:8888

Key decisions explained:

  1. Tail sampling (not probabilistic): Makes decisions after seeing the entire trace

    • Keeps 100% of errors and slow requests
    • Samples only 5% of fast successful requests
    • Waits 10s to collect all spans of a trace before deciding
  2. Memory protection: 2GB limit prevents OOM during traffic spikes

  3. Large batches: 8192 spans per batch reduces network overhead at high volume

  4. TLS: Production requires encrypted communication

  5. Collector telemetry: Exposes metrics on :8888 for monitoring the collector itself

⚠️ Production warning: Tail sampling requires significant memory (stores traces while waiting). Size num_traces and decision_wait based on your trace rate.

Example 3: Multi-Backend Fan-Out

Scenario: You want to send traces to both Jaeger (for developers) and a commercial APM vendor (for operations), while keeping metrics in Prometheus.

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
  hostmetrics:
    collection_interval: 30s
    scrapers:
      cpu:
      memory:
      disk:
      network:

processors:
  memory_limiter:
    limit_mib: 1024
  
  batch:
    timeout: 10s
  
  # Filter PII before sending to commercial vendor
  attributes/strip-pii:
    actions:
      - key: user.email
        action: delete
      - key: user.phone
        action: delete
      - key: credit_card
        action: delete
  
  resource:
    attributes:
      - key: environment
        value: staging
        action: insert

exporters:
  # Internal Jaeger (full data)
  otlp/jaeger:
    endpoint: jaeger:4317
    tls:
      insecure: true
  
  # Commercial vendor (PII stripped)
  otlp/vendor:
    endpoint: vendor-endpoint:443
    headers:
      api-key: ${VENDOR_API_KEY}
  
  prometheusremotewrite:
    endpoint: http://prometheus:9090/api/v1/write
    tls:
      insecure: true

service:
  pipelines:
    # Traces to Jaeger (internal, full data)
    traces/internal:
      receivers: [otlp]
      processors: [memory_limiter, resource, batch]
      exporters: [otlp/jaeger]
    
    # Traces to vendor (PII stripped)
    traces/vendor:
      receivers: [otlp]
      processors: [memory_limiter, attributes/strip-pii, resource, batch]
      exporters: [otlp/vendor]
    
    # Metrics to Prometheus
    metrics:
      receivers: [otlp, hostmetrics]
      processors: [memory_limiter, batch]
      exporters: [prometheusremotewrite]

Architecture insights:

  1. Multiple pipelines: Same signal type (traces) can have multiple pipelines with different processing
  2. Named exporters: Suffix like /jaeger and /vendor creates unique exporter instances
  3. Differential processing: PII stripping only in vendor pipeline
  4. Environment variables: Use ${VENDOR_API_KEY} for secrets (pass via env vars, not hardcode)
  5. Host metrics: Collector monitors itself and the host it runs on

🌍 Real-world use: This pattern is common in enterprises with compliance requirementsβ€”keep full data internal, sanitize for external vendors.

Example 4: Kubernetes DaemonSet with Service Discovery

Scenario: Collector agents running on every Kubernetes node, automatically discovering and scraping Prometheus metrics from pods.

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
  
  prometheus:
    config:
      scrape_configs:
        - job_name: 'kubernetes-pods'
          kubernetes_sd_configs:
            - role: pod
              namespaces:
                names: ['production', 'staging']
          relabel_configs:
            # Only scrape pods with prometheus.io/scrape=true annotation
            - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
              action: keep
              regex: true
            # Use port from prometheus.io/port annotation
            - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port]
              action: replace
              target_label: __address__
              regex: ([^:]+)(?::\d+)?;(\d+)
              replacement: $1:$2
            # Add pod labels as metric labels
            - action: labelmap
              regex: __meta_kubernetes_pod_label_(.+)

processors:
  memory_limiter:
    limit_mib: 512
  
  batch:
    timeout: 10s
  
  k8sattributes:
    auth_type: serviceAccount
    passthrough: false
    extract:
      metadata:
        - k8s.namespace.name
        - k8s.deployment.name
        - k8s.pod.name
        - k8s.node.name
      labels:
        - tag_name: app.name
          key: app
          from: pod

exporters:
  otlp:
    endpoint: collector-gateway.observability:4317
    tls:
      insecure: true

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, k8sattributes, batch]
      exporters: [otlp]
    
    metrics:
      receivers: [prometheus]
      processors: [memory_limiter, k8sattributes, batch]
      exporters: [otlp]

Kubernetes-specific features:

  1. Service discovery: Automatically finds pods with prometheus.io/scrape: "true" annotation
  2. k8sattributes processor: Enriches telemetry with Kubernetes metadata (namespace, pod name, labels)
  3. RBAC: Requires ServiceAccount with permissions to list/watch pods and nodes
  4. DaemonSet deployment: One collector per node ensures local collection
  5. Gateway forwarding: Agents send to central gateway for aggregation

πŸ’‘ Deployment tip: Use Helm charts or Kubernetes Operator for production deploymentβ€”they handle RBAC, ConfigMaps, and upgrades automatically.

Common Mistakes

❌ Mistake 1: Not Using memory_limiter

Problem: Collector crashes with OOM under traffic spikes.

Why it happens: Without backpressure, collector accepts unlimited data, overwhelming memory.

Solution: Always configure memory_limiter as the first processor:

processors:
  memory_limiter:
    check_interval: 1s
    limit_mib: 512  # 80% of container limit
    spike_limit_mib: 128  # 20% buffer for spikes

⚠️ Set limit_mib to 80% of your container's memory limit, leaving headroom for Go's garbage collector.

❌ Mistake 2: Batching Too Aggressively

Problem: High latency or lost data during collector restarts.

Symptoms:

  • Traces appear 30+ seconds after generation
  • Collector restart loses thousands of spans

Why it happens: Oversized batches (e.g., timeout: 60s, send_batch_size: 100000) hold data too long.

Solution: Use reasonable batch settings:

processors:
  batch:
    timeout: 10s  # Export at least every 10s
    send_batch_size: 8192  # Or when 8192 items collected
    send_batch_max_size: 10000  # Hard limit

πŸ’‘ Rule of thumb: timeout should be 5-10 seconds for production, 1-2 seconds for development.

❌ Mistake 3: Wrong Processor Order

Problem: Processors don't work as expected, or memory protection fails.

Bad order:

processors: [batch, filter, memory_limiter]  # WRONG!

Why it's wrong:

  • Batching happens before filtering (wastes memory on unwanted data)
  • Memory limiter runs last (data already consumed memory)

Correct order:

processors: [memory_limiter, filter, resource, attributes, batch]  # RIGHT!

Best practice ordering:

  1. memory_limiter (protect first)
  2. Filters (remove unwanted data early)
  3. Enrichment processors (resource, attributes, k8sattributes)
  4. Sampling (tail_sampling, probabilistic_sampler)
  5. batch (batch last before export)

❌ Mistake 4: Ignoring Collector Self-Monitoring

Problem: Collector silently drops data, no visibility into why.

Solution: Enable collector telemetry and monitor these metrics:

service:
  telemetry:
    logs:
      level: info
    metrics:
      level: detailed
      address: 0.0.0.0:8888

Key metrics to alert on:

  • otelcol_receiver_refused_spans: Backpressure from memory limiter
  • otelcol_exporter_send_failed_spans: Export failures
  • otelcol_processor_dropped_spans: Sampling or filtering drops
  • otelcol_process_memory_rss: Collector memory usage

❌ Mistake 5: Using Tail Sampling Without Understanding Cost

Problem: Collector uses 10GB+ memory, crashes randomly.

Why it happens: Tail sampling stores entire traces in memory while waiting for decision_wait period.

Cost calculation:

Memory needed = avg_trace_size Γ— traces_per_second Γ— decision_wait
Example: 50KB Γ— 1000 tps Γ— 10s = 500MB minimum

Solution: Either:

  1. Use head-based sampling (probabilistic_sampler) if you don't need smart decisions
  2. Size tail sampling carefully:
    tail_sampling:
      decision_wait: 5s  # Shorter wait
      num_traces: 50000  # Fewer buffered traces
    
  3. Deploy tail sampling only in gateway collectors (not agents)

❌ Mistake 6: Hardcoding Secrets

Problem: API keys visible in configuration files, committed to Git.

Bad:

exporters:
  otlp:
    headers:
      api-key: "sk_live_abc123..."  # NEVER DO THIS!

Good:

exporters:
  otlp:
    headers:
      api-key: ${VENDOR_API_KEY}  # Read from environment

Then pass via environment variable:

export VENDOR_API_KEY="sk_live_abc123..."
./otelcol --config=config.yaml

πŸ”’ Security tip: Use Kubernetes Secrets, AWS Secrets Manager, or HashiCorp Vault for production secrets.

Key Takeaways

πŸ“‹ Quick Reference Card: Collectors and Pipelines

Concept Key Points
Collector Role Vendor-agnostic proxy: receives, processes, exports telemetry
Deployment Patterns Agent (per-host), Gateway (centralized), Sidecar (per-service)
Pipeline Types Traces, Metrics, Logsβ€”each has receivers β†’ processors β†’ exporters
Receivers Entry points: otlp, jaeger, prometheus, hostmetrics, filelog
Processors Transform data: batch, memory_limiter, filter, resource, tail_sampling
Exporters Send to backends: otlp, prometheus, jaeger, logging, file
Processor Order memory_limiter β†’ filters β†’ enrichment β†’ sampling β†’ batch
Production Essentials 1. memory_limiter (prevent OOM)
2. Batch reasonably (10s timeout)
3. Enable telemetry (:8888)
4. Use env vars for secrets
Sampling Strategies Head-based (probabilistic) = simple, low memory
Tail-based = smart (keep errors), high memory
Multi-Backend Use multiple pipelines with different processors per destination

🧠 Mental Model: Think of collectors as intelligent routers with a three-stage pipeline:

  1. Receive (accept from multiple protocols)
  2. Process (filter, enrich, sample, batch)
  3. Export (deliver to one or more backends)

πŸ’‘ Remember: Start simple (receiver β†’ batch β†’ exporter), then add processors as needs emerge. Premature optimization leads to complex, brittle configurations.

πŸ“š Further Study

  1. OpenTelemetry Collector Official Docs: https://opentelemetry.io/docs/collector/ - Comprehensive reference for all components and configuration options

  2. OpenTelemetry Collector Contrib Repository: https://github.com/open-telemetry/opentelemetry-collector-contrib - Source code and documentation for 100+ community-contributed receivers, processors, and exporters

  3. Collector Performance Tuning Guide: https://opentelemetry.io/docs/collector/performance/ - Best practices for scaling collectors to millions of spans per second in production environments