Collectors and Pipelines
Design telemetry collection architecture with receivers, processors, and exporters
OpenTelemetry Collectors and Pipelines
Master OpenTelemetry collectors and pipelines with free flashcards and spaced repetition practice. This lesson covers collector architecture, pipeline components, and processing patternsβessential concepts for building production-grade observability systems in modern distributed environments.
Welcome
π» Understanding OpenTelemetry Collectors and their pipeline architecture is fundamental to implementing effective observability. While instrumentation generates telemetry signals (traces, metrics, logs), collectors act as the intelligent middleware that receives, processes, and exports this data to your backend systems. Think of collectors as the postal service of observabilityβthey handle routing, transformation, batching, and delivery of your telemetry data.
This lesson demystifies how collectors work internally, why pipelines matter, and how to configure them for real-world production scenarios. Whether you're running a microservices architecture or monitoring a monolithic application, collectors provide the flexibility and scalability you need.
Core Concepts
What is an OpenTelemetry Collector?
π§ The OpenTelemetry Collector is a vendor-agnostic proxy that receives, processes, and exports telemetry data. It's a standalone binary that runs as a sidecar, daemon, or centralized service in your infrastructure.
Key characteristics:
- Vendor-neutral: Works with any backend (Prometheus, Jaeger, Datadog, New Relic, etc.)
- Language-agnostic: Accepts data from applications in any programming language
- Configurable: Uses YAML configuration for complete pipeline customization
- Extensible: Supports custom receivers, processors, and exporters via plugins
- High-performance: Written in Go, handles millions of spans per second
π‘ Pro tip: Start with the OpenTelemetry Collector Contrib distribution, which includes 100+ components. The core distribution contains only basic components.
Deployment Patterns
Collectors can be deployed in three primary patterns:
| Pattern | Description | Use Case | Trade-offs |
|---|---|---|---|
| Agent Mode | Deployed on each host/node (DaemonSet in Kubernetes) | Local collection from apps on the same host | β
Low latency β Higher resource usage per node |
| Gateway Mode | Centralized collector receiving from multiple agents | Data aggregation, enrichment, sampling | β
Centralized processing β Single point of failure (mitigate with load balancing) |
| Sidecar Mode | Container deployed alongside application container | Per-service isolation, custom processing | β
Service-specific config β Highest resource overhead |
π Real-world analogy: Agent mode is like neighborhood post offices, gateway mode is the regional distribution center, and sidecar mode is a personal assistant handling your mail.
DEPLOYMENT ARCHITECTURE
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β APPLICATION LAYER β
β ββββββββββ ββββββββββ ββββββββββ β
β β App A β β App B β β App C β β
β βββββ¬βββββ βββββ¬βββββ βββββ¬βββββ β
β β OTLP β OTLP β OTLP β
ββββββββΌββββββββββββΌββββββββββββΌββββββββββββββββββ
β β β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β COLLECTOR AGENT LAYER β
β ββββββββββ ββββββββββ ββββββββββ β
β βAgent 1 β βAgent 2 β βAgent 3 β β
β βββββ¬βββββ βββββ¬βββββ βββββ¬βββββ β
ββββββββΌββββββββββββΌββββββββββββΌββββββββββββββββββ
β β β
βββββββββββββ΄ββββββββββββ
β
βββββββββββββββββββββββββ
β COLLECTOR GATEWAY β
β (Load Balanced) β
βββββββββββββ¬ββββββββββββ
β
βββββββββββββββββββββββββ
β BACKEND SYSTEMS β
β Jaeger | Prometheus β
β Loki | Datadog β
βββββββββββββββββββββββββ
Pipeline Architecture
πΊ A pipeline is the core abstraction in collectors. Every collector configuration defines three types of pipelines:
- Traces pipeline: Processes distributed trace spans
- Metrics pipeline: Handles time-series measurements
- Logs pipeline: Manages structured log records
Each pipeline consists of three component types:
PIPELINE FLOW
ββββββββββββββ ββββββββββββββ ββββββββββββββ
β RECEIVERS β βββ β PROCESSORS β βββ β EXPORTERS β
ββββββββββββββ ββββββββββββββ ββββββββββββββ
β β β
β β β
Input Transform Output
(OTLP, Jaeger) (batch, filter, (Jaeger, Prom,
Prometheus) sample, enrich) OTLP, files)
1. Receivers
Receivers are the entry points for telemetry data. They listen on specific protocols and ports.
Common receivers:
otlp: Native OpenTelemetry protocol (gRPC or HTTP)jaeger: Jaeger Thrift formatzipkin: Zipkin JSON v1/v2prometheus: Scrapes Prometheus metricshostmetrics: Collects system metrics (CPU, memory, disk)filelog: Reads logs from files
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
prometheus:
config:
scrape_configs:
- job_name: 'my-app'
scrape_interval: 30s
static_configs:
- targets: ['localhost:8080']
π‘ Important: Receivers are push-based (OTLP, Jaeger) or pull-based (Prometheus scraping). Choose based on your application's export method.
2. Processors
Processors transform, filter, or enrich data as it flows through the pipeline. They run in sequence.
Essential processors:
| Processor | Purpose | Example Use Case |
|---|---|---|
batch |
Groups telemetry before export | Reduce network calls (export every 10s or 8192 spans) |
memory_limiter |
Prevents OOM by applying backpressure | Limit collector to 512MB memory usage |
resource |
Adds/modifies resource attributes | Add environment=production, cluster=us-west |
attributes |
Manipulates span/metric attributes | Remove PII, add derived fields |
filter |
Drops unwanted telemetry | Exclude health check spans |
probabilistic_sampler |
Samples percentage of traces | Keep only 10% of traces to reduce volume |
tail_sampling |
Smart sampling based on span data | Keep all error traces, sample successful ones |
processors:
batch:
timeout: 10s
send_batch_size: 1024
memory_limiter:
check_interval: 1s
limit_mib: 512
spike_limit_mib: 128
resource:
attributes:
- key: environment
value: production
action: upsert
- key: cluster
value: us-west-2
action: insert
filter:
traces:
span:
- 'attributes["http.target"] == "/health"'
β οΈ Critical ordering: Place memory_limiter first, batch last:
processors: [memory_limiter, filter, resource, batch]
This ensures memory protection happens before processing, and batching happens right before export.
3. Exporters
Exporters send processed telemetry to backend systems. Multiple exporters can run in parallel.
Popular exporters:
otlp: Send to any OTLP-compatible backendotlphttp: OTLP over HTTP (better for proxies/firewalls)jaeger: Export to Jaeger backendprometheus: Expose metrics endpoint for Prometheus to scrapeprometheusremotewrite: Push metrics to Prometheus remote write endpointlogging: Debug exporter (prints to console)file: Write to local files (useful for replay/debugging)
exporters:
otlp:
endpoint: jaeger:4317
tls:
insecure: false
cert_file: /certs/client.crt
key_file: /certs/client.key
prometheus:
endpoint: "0.0.0.0:8889"
namespace: "my_app"
logging:
loglevel: debug
sampling_initial: 5
sampling_thereafter: 200
Complete Pipeline Configuration
Here's how receivers, processors, and exporters connect into pipelines:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
processors:
memory_limiter:
limit_mib: 512
batch:
timeout: 10s
send_batch_size: 1024
resource:
attributes:
- key: service.namespace
value: production
action: insert
exporters:
otlp:
endpoint: tempo:4317
tls:
insecure: true
prometheus:
endpoint: 0.0.0.0:8889
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, resource, batch]
exporters: [otlp]
metrics:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [prometheus]
π§ Key insight: The service section wires everything together. Components are referenced by name from their respective sections.
CONFIGURATION STRUCTURE
βββββββββββββββββββββββββββββββββββββββββββ
β receivers: {...} β Define components
β processors: {...} β (implementation)
β exporters: {...} β
βββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββ
β service: β Wire components
β pipelines: β into pipelines
β traces: β (configuration)
β receivers: [otlp, jaeger] β
β processors: [batch, resource] β
β exporters: [otlp] β
βββββββββββββββββββββββββββββββββββββββββββ
Examples
Example 1: Simple Local Development Setup
Scenario: You're developing a microservice locally and want to send traces to Jaeger running in Docker.
receivers:
otlp:
protocols:
grpc:
endpoint: localhost:4317
http:
endpoint: localhost:4318
processors:
batch:
timeout: 1s
exporters:
logging:
loglevel: debug
otlp:
endpoint: localhost:4317
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [logging, otlp]
Why this works:
- Accepts OTLP on both gRPC (4317) and HTTP (4318) for flexibility
- Minimal batching (1s) for quick feedback during development
loggingexporter prints spans to console for immediate debugging- Sends to local Jaeger on 4317
- No memory limiter needed (low volume)
π‘ Dev tip: Keep the logging exporter during developmentβit's invaluable for debugging instrumentation issues.
Example 2: Production Gateway with Sampling
Scenario: High-traffic production system generating millions of spans. You need cost-effective sampling while keeping all error traces.
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
processors:
memory_limiter:
check_interval: 1s
limit_mib: 2048
spike_limit_mib: 512
tail_sampling:
decision_wait: 10s
num_traces: 100000
expected_new_traces_per_sec: 1000
policies:
# Keep all error traces
- name: error-traces
type: status_code
status_code:
status_codes: [ERROR]
# Keep all slow traces (>2s)
- name: slow-traces
type: latency
latency:
threshold_ms: 2000
# Sample 5% of successful traces
- name: probabilistic-policy
type: probabilistic
probabilistic:
sampling_percentage: 5
batch:
timeout: 10s
send_batch_size: 8192
resource:
attributes:
- key: deployment.environment
value: production
action: insert
exporters:
otlp:
endpoint: tempo-gateway:4317
tls:
insecure: false
cert_file: /etc/collector/certs/client.crt
key_file: /etc/collector/certs/client.key
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, tail_sampling, resource, batch]
exporters: [otlp]
telemetry:
logs:
level: info
metrics:
level: detailed
address: 0.0.0.0:8888
Key decisions explained:
Tail sampling (not probabilistic): Makes decisions after seeing the entire trace
- Keeps 100% of errors and slow requests
- Samples only 5% of fast successful requests
- Waits 10s to collect all spans of a trace before deciding
Memory protection: 2GB limit prevents OOM during traffic spikes
Large batches: 8192 spans per batch reduces network overhead at high volume
TLS: Production requires encrypted communication
Collector telemetry: Exposes metrics on :8888 for monitoring the collector itself
β οΈ Production warning: Tail sampling requires significant memory (stores traces while waiting). Size num_traces and decision_wait based on your trace rate.
Example 3: Multi-Backend Fan-Out
Scenario: You want to send traces to both Jaeger (for developers) and a commercial APM vendor (for operations), while keeping metrics in Prometheus.
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
hostmetrics:
collection_interval: 30s
scrapers:
cpu:
memory:
disk:
network:
processors:
memory_limiter:
limit_mib: 1024
batch:
timeout: 10s
# Filter PII before sending to commercial vendor
attributes/strip-pii:
actions:
- key: user.email
action: delete
- key: user.phone
action: delete
- key: credit_card
action: delete
resource:
attributes:
- key: environment
value: staging
action: insert
exporters:
# Internal Jaeger (full data)
otlp/jaeger:
endpoint: jaeger:4317
tls:
insecure: true
# Commercial vendor (PII stripped)
otlp/vendor:
endpoint: vendor-endpoint:443
headers:
api-key: ${VENDOR_API_KEY}
prometheusremotewrite:
endpoint: http://prometheus:9090/api/v1/write
tls:
insecure: true
service:
pipelines:
# Traces to Jaeger (internal, full data)
traces/internal:
receivers: [otlp]
processors: [memory_limiter, resource, batch]
exporters: [otlp/jaeger]
# Traces to vendor (PII stripped)
traces/vendor:
receivers: [otlp]
processors: [memory_limiter, attributes/strip-pii, resource, batch]
exporters: [otlp/vendor]
# Metrics to Prometheus
metrics:
receivers: [otlp, hostmetrics]
processors: [memory_limiter, batch]
exporters: [prometheusremotewrite]
Architecture insights:
- Multiple pipelines: Same signal type (traces) can have multiple pipelines with different processing
- Named exporters: Suffix like
/jaegerand/vendorcreates unique exporter instances - Differential processing: PII stripping only in vendor pipeline
- Environment variables: Use
${VENDOR_API_KEY}for secrets (pass via env vars, not hardcode) - Host metrics: Collector monitors itself and the host it runs on
π Real-world use: This pattern is common in enterprises with compliance requirementsβkeep full data internal, sanitize for external vendors.
Example 4: Kubernetes DaemonSet with Service Discovery
Scenario: Collector agents running on every Kubernetes node, automatically discovering and scraping Prometheus metrics from pods.
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
prometheus:
config:
scrape_configs:
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
namespaces:
names: ['production', 'staging']
relabel_configs:
# Only scrape pods with prometheus.io/scrape=true annotation
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
# Use port from prometheus.io/port annotation
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
# Add pod labels as metric labels
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
processors:
memory_limiter:
limit_mib: 512
batch:
timeout: 10s
k8sattributes:
auth_type: serviceAccount
passthrough: false
extract:
metadata:
- k8s.namespace.name
- k8s.deployment.name
- k8s.pod.name
- k8s.node.name
labels:
- tag_name: app.name
key: app
from: pod
exporters:
otlp:
endpoint: collector-gateway.observability:4317
tls:
insecure: true
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, k8sattributes, batch]
exporters: [otlp]
metrics:
receivers: [prometheus]
processors: [memory_limiter, k8sattributes, batch]
exporters: [otlp]
Kubernetes-specific features:
- Service discovery: Automatically finds pods with
prometheus.io/scrape: "true"annotation - k8sattributes processor: Enriches telemetry with Kubernetes metadata (namespace, pod name, labels)
- RBAC: Requires ServiceAccount with permissions to list/watch pods and nodes
- DaemonSet deployment: One collector per node ensures local collection
- Gateway forwarding: Agents send to central gateway for aggregation
π‘ Deployment tip: Use Helm charts or Kubernetes Operator for production deploymentβthey handle RBAC, ConfigMaps, and upgrades automatically.
Common Mistakes
β Mistake 1: Not Using memory_limiter
Problem: Collector crashes with OOM under traffic spikes.
Why it happens: Without backpressure, collector accepts unlimited data, overwhelming memory.
Solution: Always configure memory_limiter as the first processor:
processors:
memory_limiter:
check_interval: 1s
limit_mib: 512 # 80% of container limit
spike_limit_mib: 128 # 20% buffer for spikes
β οΈ Set limit_mib to 80% of your container's memory limit, leaving headroom for Go's garbage collector.
β Mistake 2: Batching Too Aggressively
Problem: High latency or lost data during collector restarts.
Symptoms:
- Traces appear 30+ seconds after generation
- Collector restart loses thousands of spans
Why it happens: Oversized batches (e.g., timeout: 60s, send_batch_size: 100000) hold data too long.
Solution: Use reasonable batch settings:
processors:
batch:
timeout: 10s # Export at least every 10s
send_batch_size: 8192 # Or when 8192 items collected
send_batch_max_size: 10000 # Hard limit
π‘ Rule of thumb: timeout should be 5-10 seconds for production, 1-2 seconds for development.
β Mistake 3: Wrong Processor Order
Problem: Processors don't work as expected, or memory protection fails.
Bad order:
processors: [batch, filter, memory_limiter] # WRONG!
Why it's wrong:
- Batching happens before filtering (wastes memory on unwanted data)
- Memory limiter runs last (data already consumed memory)
Correct order:
processors: [memory_limiter, filter, resource, attributes, batch] # RIGHT!
Best practice ordering:
memory_limiter(protect first)- Filters (remove unwanted data early)
- Enrichment processors (resource, attributes, k8sattributes)
- Sampling (tail_sampling, probabilistic_sampler)
batch(batch last before export)
β Mistake 4: Ignoring Collector Self-Monitoring
Problem: Collector silently drops data, no visibility into why.
Solution: Enable collector telemetry and monitor these metrics:
service:
telemetry:
logs:
level: info
metrics:
level: detailed
address: 0.0.0.0:8888
Key metrics to alert on:
otelcol_receiver_refused_spans: Backpressure from memory limiterotelcol_exporter_send_failed_spans: Export failuresotelcol_processor_dropped_spans: Sampling or filtering dropsotelcol_process_memory_rss: Collector memory usage
β Mistake 5: Using Tail Sampling Without Understanding Cost
Problem: Collector uses 10GB+ memory, crashes randomly.
Why it happens: Tail sampling stores entire traces in memory while waiting for decision_wait period.
Cost calculation:
Memory needed = avg_trace_size Γ traces_per_second Γ decision_wait
Example: 50KB Γ 1000 tps Γ 10s = 500MB minimum
Solution: Either:
- Use head-based sampling (probabilistic_sampler) if you don't need smart decisions
- Size tail sampling carefully:
tail_sampling: decision_wait: 5s # Shorter wait num_traces: 50000 # Fewer buffered traces - Deploy tail sampling only in gateway collectors (not agents)
β Mistake 6: Hardcoding Secrets
Problem: API keys visible in configuration files, committed to Git.
Bad:
exporters:
otlp:
headers:
api-key: "sk_live_abc123..." # NEVER DO THIS!
Good:
exporters:
otlp:
headers:
api-key: ${VENDOR_API_KEY} # Read from environment
Then pass via environment variable:
export VENDOR_API_KEY="sk_live_abc123..."
./otelcol --config=config.yaml
π Security tip: Use Kubernetes Secrets, AWS Secrets Manager, or HashiCorp Vault for production secrets.
Key Takeaways
π Quick Reference Card: Collectors and Pipelines
| Concept | Key Points |
|---|---|
| Collector Role | Vendor-agnostic proxy: receives, processes, exports telemetry |
| Deployment Patterns | Agent (per-host), Gateway (centralized), Sidecar (per-service) |
| Pipeline Types | Traces, Metrics, Logsβeach has receivers β processors β exporters |
| Receivers | Entry points: otlp, jaeger, prometheus, hostmetrics, filelog |
| Processors | Transform data: batch, memory_limiter, filter, resource, tail_sampling |
| Exporters | Send to backends: otlp, prometheus, jaeger, logging, file |
| Processor Order | memory_limiter β filters β enrichment β sampling β batch |
| Production Essentials | 1. memory_limiter (prevent OOM) 2. Batch reasonably (10s timeout) 3. Enable telemetry (:8888) 4. Use env vars for secrets |
| Sampling Strategies | Head-based (probabilistic) = simple, low memory Tail-based = smart (keep errors), high memory |
| Multi-Backend | Use multiple pipelines with different processors per destination |
π§ Mental Model: Think of collectors as intelligent routers with a three-stage pipeline:
- Receive (accept from multiple protocols)
- Process (filter, enrich, sample, batch)
- Export (deliver to one or more backends)
π‘ Remember: Start simple (receiver β batch β exporter), then add processors as needs emerge. Premature optimization leads to complex, brittle configurations.
π Further Study
OpenTelemetry Collector Official Docs: https://opentelemetry.io/docs/collector/ - Comprehensive reference for all components and configuration options
OpenTelemetry Collector Contrib Repository: https://github.com/open-telemetry/opentelemetry-collector-contrib - Source code and documentation for 100+ community-contributed receivers, processors, and exporters
Collector Performance Tuning Guide: https://opentelemetry.io/docs/collector/performance/ - Best practices for scaling collectors to millions of spans per second in production environments