Delivery Semantics & Schema Strategy
Understand at-most-once, at-least-once, and effectively-once. Pair delivery semantics with a real schema strategy — not random JSON and vibes.
Why Delivery Semantics and Schema Design Are Inseparable
Imagine you've just deployed a Kafka consumer that processes payment confirmation events. Your broker configuration guarantees at-least-once delivery — if the network hiccups, the message will be retried. You feel confident. Then a teammate adds a new field to the event payload: currency_code, a string that your existing consumer never knew to expect. The message arrives, your consumer silently deserializes it, drops the unknown field, and proceeds. The funds transfer completes. The currency is wrong. Nobody noticed at deserialization time because nothing failed — the contract was never enforced.
This is the core tension in distributed messaging, and it's the reason delivery semantics and schema strategy must be designed together. A Kafka message is only as useful as the contract it carries. Guaranteeing delivery is a solved problem; guaranteeing meaning at delivery is the harder one.
The Two-Dimensional Producer-Consumer Contract
When a producer and consumer interact over Kafka, they're making an implicit contract along two independent axes:
┌─────────────────────────────────────────────────────────┐
│ The Producer-Consumer Contract │
├──────────────────────────┬──────────────────────────────┤
│ AXIS 1: DELIVERY │ AXIS 2: MEANING │
│ "Will this message │ "Will it mean the same │
│ arrive?" │ thing when it does?" │
├──────────────────────────┼──────────────────────────────┤
│ Controlled by: │ Controlled by: │
│ • acks configuration │ • Schema definition │
│ • Offset commit timing │ • Schema registry │
│ • Retry settings │ • Compatibility rules │
│ • Idempotent producers │ • Serialization format │
│ • Transactions │ • Versioning discipline │
└──────────────────────────┴──────────────────────────────┘
Most teams spend the majority of their Kafka configuration effort on the left column. The right column gets treated as an implementation detail — pick JSON, move fast, fix it later. The cost of that asymmetry shows up months into production, not in the initial build.
🎯 Key Principle: Both axes must be answered together — because the choice on one axis constrains what's practical on the other. A delivery guarantee that produces duplicates requires a schema that can carry a stable identity field. A schema that silently drops unknown fields undermines the guarantee that messages were processed correctly.
Three Delivery Modes, Three Different Schema Requirements
The three delivery modes — at-most-once, at-least-once, and effectively-once — are not just configuration knobs. Each mode makes a different implicit demand on your message schema.
At-most-once delivery commits offsets before processing completes. If the consumer crashes mid-handler, the message is gone. You'll never see the same message twice, so you don't need duplicate detection in your payload — but you also lose the ability to replay, which means your schema must be rich enough to act on a single delivery with no second chance.
At-least-once delivery is where schema design gets load-bearing. When the broker or consumer can replay a message after a crash or rebalance, your consumer will occasionally process the same logical event more than once. A schema that carries no stable identity — no event_id, no idempotency_key, no natural key — gives the consumer no way to detect that it has already acted on this event. The delivery guarantee hands the consumer a problem that only the schema can solve.
Effectively-once delivery uses idempotent producers and Kafka transactions to eliminate duplicates at the broker level. This offloads duplicate detection away from the consumer, but it introduces a new schema concern: consumers using IsolationLevel.ReadCommitted will not see messages from open transactions, which changes the observable ordering of events. A schema that assumes strict arrival order must account for the fact that effectively-once delivery gives you consistency within a transaction, not across all partitions simultaneously.
The specific mechanics of each mode are covered in depth in the next section. The principle to hold here is that each mode opens a different set of failure scenarios, and schema design is what determines whether your consumer can handle those scenarios gracefully.
📋 Delivery Mode × Schema Requirement
| Delivery Mode | Schema Must Carry | Consumer Responsibility |
|------------------|--------------------------------|---------------------------------|
| At-most-once | Sequence/timestamp (loss | Detect gaps, not dupes |
| | detection only) | |
| At-least-once | Stable EventId (dupe | Check EventId before processing |
| | detection) | |
| Effectively-once | EventId (audit/trace only) | None (broker handles it) |
Choosing Raw JSON Is a Decision, Not a Default
There's a temptation — especially early in a project — to reach for System.Text.Json, serialize your C# objects to UTF-8 bytes, and push them onto the wire. It works immediately. Here's a minimal example:
// Producer side — no schema registry, no contract
var message = new OrderPlaced
{
OrderId = Guid.NewGuid(),
CustomerId = "cust-42",
TotalAmount = 149.99m
};
var bytes = JsonSerializer.SerializeToUtf8Bytes(message);
await producer.ProduceAsync("orders", new Message<string, byte[]>
{
Key = message.OrderId.ToString(),
Value = bytes
});
This code works. The problem is what it doesn't do: it establishes no shared contract between the producer and any consumer. The OrderPlaced class on the producer and the OrderPlaced class on the consumer are just two C# files that happen to share a name. When they diverge — and they will — the consumer has no mechanism to detect the mismatch.
Compare this to a producer that publishes a schema to a registry before producing:
// Producer side — schema registry enforces the contract
var config = new ProducerConfig { BootstrapServers = "localhost:9092" };
var schemaRegistryConfig = new SchemaRegistryConfig { Url = "http://localhost:8081" };
using var schemaRegistry = new CachedSchemaRegistryClient(schemaRegistryConfig);
using var producer = new ProducerBuilder<string, OrderPlaced>(config)
.SetValueSerializer(
new AvroSerializer<OrderPlaced>(schemaRegistry) // enforces compatibility at runtime
)
.Build();
await producer.ProduceAsync("orders", new Message<string, OrderPlaced>
{
Key = message.OrderId.ToString(),
Value = message
});
When the producer calls ProduceAsync, the serializer checks the schema registry to confirm that the OrderPlaced schema is compatible with any schema already registered under that topic's subject. If a field was removed or renamed in a way that violates the compatibility rules, the produce call fails — before a single consumer is affected.
⚠️ Common Mistake: Treating the absence of a schema registry as "no schema" — in reality, the schema still exists; it just lives exclusively in your C# class files, unversioned and unenforced. Every deserialization is a silent hope that both sides have the same definition in mind.
Schema Drift Is a Delivery Semantics Problem
Schema drift — the gradual divergence between what producers write and what consumers expect — is not just a data quality problem. It interacts directly with delivery semantics. Consider at-least-once delivery, where replaying old messages is part of the normal failure-recovery path. Without a registry enforcing backward compatibility — meaning new consumers can read old messages — replay becomes a deserialization minefield.
Time →
Week 1: Producer emits OrderPlaced { order_id, customer_id, total }
Consumer commits offsets cleanly
Week 3: Schema changes — 'total' renamed to 'total_amount'
Producer deploys first
Consumer redeploys second (rolling deploy, 10-minute gap)
During the gap:
┌─────────────────────────────────────────────────────┐
│ Old consumer reads new messages with 'total_amount' │
│ 'total' field is null — silently zero or default │
│ Orders processed with $0.00 total │
└─────────────────────────────────────────────────────┘
A schema registry with backward compatibility configured would have rejected the rename on the producer side — before it deployed — because renaming a field is not a backward-compatible change. The registry catches the mismatch at write time, not at consumer read time.
Schema compatibility rules are not bureaucratic overhead — they are the mechanism by which replay safety (a delivery semantics concern) and schema evolution (a data modeling concern) are kept in sync. With that framing established, the next section examines the three delivery modes in full mechanical detail.
The Three Delivery Modes: Guarantees, Trade-offs, and What They Actually Cost
Every message you send through Kafka will be delivered under exactly one of three behavioral contracts: at most once, at least once, or effectively once. These are not just philosophical stances — each mode maps to specific configuration knobs, specific failure scenarios, and specific costs that show up in latency, throughput, or code complexity. Understanding which mode you are actually operating in (rather than which mode you assume you are in) is the difference between a system that behaves predictably under failure and one that silently corrupts state.
At-Most-Once: When Loss Is Cheaper Than Duplication
At-most-once delivery means a message is processed zero or one times — never more than once, but possibly never at all. This guarantee is achieved by committing the offset before processing the message. If your consumer crashes after the commit but before processing completes, the broker considers that message consumed and will not redeliver it.
In Confluent's .NET client, you achieve this by calling StoreOffset before your processing logic runs:
// At-most-once pattern: commit offset before processing
using var consumer = new ConsumerBuilder<string, string>(consumerConfig)
.Build();
consumer.Subscribe("telemetry-events");
while (!cancellationToken.IsCancellationRequested)
{
var result = consumer.Consume(cancellationToken);
// Offset stored BEFORE the handler runs.
// A crash after this line = message is lost permanently.
consumer.StoreOffset(result);
// Any exception here means this message is gone.
await HandleTelemetryEventAsync(result.Message.Value);
}
The practical use case for at-most-once is genuinely narrow: high-throughput telemetry streams, real-time metrics, or any scenario where losing occasional data points is acceptable and processing the same reading twice would be harmful (such as triggering duplicate alerts). For most business-critical data flows, this is not the mode you want.
⚠️ Common Mistake: Assuming that EnableAutoCommit = true gives you at-least-once. With EnableAutoOffsetStore = true (the default when auto-commit is on), the .NET client internally stores the offset as soon as Consume() returns a result — before your handler has done anything. This is at-most-once behavior wearing an at-least-once costume. The async variant of this trap is covered in detail in Common Misconfiguration Patterns and How to Catch Them.
At-Least-Once: The Practical Default and Its Hidden Contract
At-least-once delivery means every message will be processed one or more times — you will never lose a message, but you might handle it more than once. This is the mode most Kafka consumers operate in, and it is the right default for the majority of use cases. The guarantee is achieved by committing offsets after successful processing:
// At-least-once pattern: commit after successful processing
var consumerConfig = new ConsumerConfig
{
BootstrapServers = "localhost:9092",
GroupId = "order-processor",
AutoOffsetReset = AutoOffsetReset.Earliest,
EnableAutoCommit = false,
EnableAutoOffsetStore = false
};
using var consumer = new ConsumerBuilder<string, OrderEvent>(consumerConfig)
.SetValueDeserializer(registeredDeserializer) // Schema-aware deserializer
.Build();
consumer.Subscribe("order-events");
while (!cancellationToken.IsCancellationRequested)
{
var result = consumer.Consume(cancellationToken);
try
{
await ProcessOrderAsync(result.Message.Value);
// Only store offset after success.
// A crash between ProcessOrder and here = redelivery on restart.
consumer.StoreOffset(result);
}
catch (Exception ex)
{
logger.LogError(ex, "Processing failed for offset {Offset}", result.Offset);
throw; // or apply a retry/dead-letter strategy
}
}
The failure scenario that produces duplicates is straightforward: your consumer processes a message successfully, then crashes before committing the offset. On restart — or after a consumer group rebalance, where partition ownership is reassigned — the broker sees the last committed offset and redelivers from there.
At-least-once failure scenario:
Partition: [offset 41] [offset 42] [offset 43]
^
last committed offset
Consumer processes offset 44, 45, 46...
Consumer applies offset 47 to database...
CRASH before StoreOffset(47) runs
On restart:
Consumer group reads: "last committed = 43"
Broker redelivers: 44, 45, 46, 47, ...
^^
processed again!
🎯 Key Principle: At-least-once is not a weakness to tolerate — it is a contract that makes the consumer responsible for idempotency. If your handler is naturally idempotent (writing to a database with an upsert keyed on event ID, for example), you get the safety for free. If it is not, you need to build it. As we will examine in the next section, including a stable event ID in your message schema is the most practical mechanism for detecting and skipping duplicates.
Effectively-Once: What It Actually Means and What It Costs
Effectively-once is the most misunderstood term in Kafka's vocabulary. Kafka does not use the phrase "exactly-once" casually — the broker cannot prevent a message from arriving at a consumer more than once across all possible failure modes. What Kafka can guarantee, through a combination of idempotent producers and transactions, is that the observable effect of processing a message happens exactly once. Not "the message was delivered once" but "the write resulting from that message appears exactly once in the output."
Idempotent Producers: Sequence Numbers and Broker-Side Deduplication
When you set EnableIdempotence = true on a .NET producer, the broker assigns each producer instance a Producer ID (PID) and tracks a monotonically increasing sequence number per partition. If a network failure causes the producer to retry a send, the broker recognizes the duplicate sequence number and silently discards it rather than writing a second copy.
var producerConfig = new ProducerConfig
{
BootstrapServers = "localhost:9092",
EnableIdempotence = true
// Setting EnableIdempotence = true automatically configures:
// Acks = All (all in-sync replicas must acknowledge)
// MessageSendMaxRetries = int.MaxValue
// MaxInFlight = 5 (up to 5 in-flight requests per connection)
//
// Do NOT manually set conflicting values — e.g., Acks = Leader
// or MaxInFlight > 5 will throw a configuration exception.
};
using var producer = new ProducerBuilder<string, OrderEvent>(producerConfig)
.SetValueSerializer(registeredSerializer)
.Build();
The automatic defaults that EnableIdempotence = true sets deserve careful attention. Acks = All means the leader waits for all in-sync replicas to acknowledge before confirming. MaxInFlight = 5 caps the number of unacknowledged requests per connection; values above 5 could allow out-of-order delivery under retry, so the broker enforces this ceiling when idempotence is enabled. If you manually set MaxInFlight to anything above 5 in the same config, the client throws a ConfigException at construction time.
Transactions: Extending the Guarantee Across Partitions
Idempotent producers deduplicate retries on a single partition. Transactional writes extend this to atomic, all-or-nothing commits across multiple partitions — the critical pattern for stream processing (read from input, write to output, commit input offset, atomically).
var producerConfig = new ProducerConfig
{
BootstrapServers = "localhost:9092",
EnableIdempotence = true,
TransactionalId = "order-processor-instance-1" // Must be unique per producer instance
};
using var producer = new ProducerBuilder<string, ProcessedOrder>(producerConfig)
.SetValueSerializer(registeredSerializer)
.Build();
producer.InitTransactions(TimeSpan.FromSeconds(10));
try
{
producer.BeginTransaction();
await producer.ProduceAsync("processed-orders", new Message<string, ProcessedOrder>
{
Key = order.OrderId,
Value = processedOrder
});
// Atomically commit the input consumer's offset as part of this transaction.
producer.SendOffsetsToTransaction(
consumedOffsets,
consumer.ConsumerGroupMetadata,
TimeSpan.FromSeconds(5)
);
producer.CommitTransaction(TimeSpan.FromSeconds(10));
}
catch (KafkaException ex) when (ex.Error.IsTransactionAbortable)
{
producer.AbortTransaction(TimeSpan.FromSeconds(5));
// The input offset is not advanced; Kafka will redeliver.
}
This pattern guarantees that the output write and the offset advance either both happen or neither happens. If the producer crashes after ProduceAsync but before CommitTransaction, the transaction is aborted, the output messages are invisible to readers, and the input offset is not advanced.
The Consumer Side: ReadCommitted
Transactional guarantees on the producer side are only meaningful if consumers respect them. A consumer with the default IsolationLevel = ReadUncommitted will see messages from in-progress transactions before they commit — and may read messages from transactions that are subsequently aborted. To exclude these:
var consumerConfig = new ConsumerConfig
{
BootstrapServers = "localhost:9092",
GroupId = "downstream-consumer",
IsolationLevel = IsolationLevel.ReadCommitted
};
⚠️ Common Mistake: Running a ReadCommitted consumer against a topic that has slow-committing transactional producers can cause the consumer's high-watermark to appear to stall. The consumer will not advance past an open transaction boundary, even if non-transactional messages arrive after it. This is correct behavior — not a bug — but it can look like a stuck consumer in monitoring dashboards.
The Real Cost of Effectively-Once
📋 Delivery Mode Trade-offs
+-----------------+-------------------+-------------------+----------------------+
| Mode | Offset Commit | Throughput | Extra Config |
+-----------------+-------------------+-------------------+----------------------+
| At-most-once | Before processing | Highest | None |
| At-least-once | After processing | High | Idempotent handlers |
| Effectively-once| Transactional | Lower | TransactionalId, |
| | | | ReadCommitted |
+-----------------+-------------------+-------------------+----------------------+
Each transactional commit requires the broker to write to the transaction log and coordinate with all participating partitions — the round-trip cost is real. TransactionalId must be unique per producer instance and stable across restarts; if two instances share a TransactionalId, the broker fences the older one. And every downstream consumer of a transactionally-produced topic must set IsolationLevel = ReadCommitted — a single consumer left at the default will silently see aborted messages.
💡 Mental Model: Think of effectively-once as a database transaction projected onto a distributed log. The strength of the guarantee comes from the same source as its cost: coordination across brokers and partitions is never free.
The right mode is not the strongest one — it is the one whose costs match the value of the guarantee. At-most-once is a deliberate choice for loss-tolerant streams; at-least-once is the practical default for most business data; effectively-once is the correct choice when neither loss nor duplication is acceptable, such as financial ledgers or inventory mutations. How these modes pair with a schema strategy is where we turn next.
Pairing Delivery Semantics with a Schema Strategy
Delivery guarantees and schema design answer different questions, but they impose constraints on each other that become visible the moment something goes wrong. At-least-once delivery — the practical default — means your consumer will occasionally process the same message twice. A schema that carries no stable identity field gives that consumer no way to recognize a duplicate without reaching for external state. The schema is not decoration on top of the delivery guarantee; it is what makes the guarantee actionable.
Why Delivery Mode Shapes Schema Requirements
Idempotency can be achieved in two ways: through external state (a database record keyed on something unique) or through the message itself carrying a stable identity. External state introduces a network round-trip and a dependency on that store's availability. A schema-level EventId field sidesteps that — the consumer can check whether it has already processed this event without a remote call.
🎯 Key Principle: An EventId field in your schema is not a nice-to-have. Under at-least-once delivery, it is the minimum information a consumer needs to detect duplicates without external coordination. Note that a local in-memory deduplication cache only survives the lifetime of the consumer process — for long retention windows or multi-instance consumer groups, a durable deduplication store is still needed. The EventId field is necessary but not always sufficient; it just ensures the schema contract supports idempotent handling at all.
At-most-once delivery inverts this. Because offsets are committed before processing, duplicates are not a concern — but loss is. Schemas for at-most-once streams benefit from fields that support downstream reconstruction if a gap is detected, such as sequence numbers or timestamps, rather than identity fields for deduplication.
Effectively-once delivery, achieved through idempotent producers and transactions, eliminates the duplicate at the broker level. The schema still benefits from an event ID for application-level auditability, but the consumer is no longer responsible for deduplication logic.
Schema Evolution and Message Replay
Schema evolution is the process of changing a schema over time — adding fields, removing fields, or changing types — while keeping producers and consumers deployable independently. The three standard compatibility modes are backward compatibility (new schema can read old messages), forward compatibility (old schema can read new messages), and full compatibility (both directions).
This intersects directly with delivery semantics: when old messages can be replayed — a natural consequence of at-least-once delivery and Kafka's durable log — a redeployed consumer will encounter messages produced under a previous schema version. If your schema evolution strategy is not enforced, that consumer may silently misinterpret those older messages.
💡 Real-World Example: A consumer group is redeployed after a schema change that adds a CustomerId field. Kafka replays messages from the last committed offset — some produced before the deployment. If the new consumer cannot read messages without CustomerId, it either throws, or silently reads a default value of null and processes incorrect data. Backward-compatible schema evolution would have made CustomerId optional with a defined default, allowing the new consumer to read both old and new messages safely.
This is where a schema registry earns its cost. A registry enforces compatibility rules at produce time, before a bad schema version ever reaches the log:
Producer registers
new schema version
│
▼
┌───────────────────┐ PASS ┌─────────────────┐
│ Schema Registry │────────────►│ Kafka Topic │
│ (compatibility │ │ (messages flow) │
│ check) │ └─────────────────┘
└───────────────────┘
│
│ FAIL (incompatible change)
▼
Registration error
returned to producer
(message never sent)
Without a registry, compatibility is enforced only by convention and code review — which means it is enforced inconsistently.
Registered Serializer vs. Raw JSON: Where Enforcement Enters the Call Chain
The most concrete way to see the difference is in the producer code itself.
Producer with a Registered Serializer
var schemaRegistryConfig = new SchemaRegistryConfig
{
Url = "http://localhost:8081"
};
var producerConfig = new ProducerConfig
{
BootstrapServers = "localhost:9092",
EnableIdempotence = true
};
using var schemaRegistry = new CachedSchemaRegistryClient(schemaRegistryConfig);
// AvroSerializer validates the schema against the registry at the first Produce call.
// An incompatible schema change throws SchemaRegistryException before any message is sent.
using var producer = new ProducerBuilder<string, OrderPlaced>(producerConfig)
.SetValueSerializer(new AvroSerializer<OrderPlaced>(schemaRegistry))
.Build();
var orderPlaced = new OrderPlaced
{
EventId = Guid.NewGuid().ToString(), // Stable identity for at-least-once consumers
OrderId = "ORD-001",
CustomerId = "CUST-42",
PlacedAtUtc = DateTime.UtcNow
};
await producer.ProduceAsync(
"orders.placed",
new Message<string, OrderPlaced>
{
Key = orderPlaced.OrderId,
Value = orderPlaced
});
Schema enforcement here is part of the serialization step. An incompatible schema change — say, a new non-nullable field added without a default — causes the produce call to throw before any message reaches the broker.
Producer with Raw JSON (No Registry)
var producerConfig = new ProducerConfig
{
BootstrapServers = "localhost:9092",
EnableIdempotence = true
};
using var producer = new ProducerBuilder<string, byte[]>(producerConfig).Build();
var orderPlaced = new OrderPlaced
{
EventId = Guid.NewGuid().ToString(),
OrderId = "ORD-001",
CustomerId = "CUST-42",
PlacedAtUtc = DateTime.UtcNow
};
// No compatibility check occurs. If OrderPlaced gains a new field tomorrow,
// old consumers deserializing with System.Text.Json will silently ignore it
// (unknown fields are dropped by default unless JsonUnmappedMemberHandling
// is set to Disallow).
var bytes = JsonSerializer.SerializeToUtf8Bytes(orderPlaced);
await producer.ProduceAsync(
"orders.placed",
new Message<string, byte[]>
{
Key = orderPlaced.OrderId,
Value = bytes
});
The raw JSON path is not wrong by definition — it is a deliberate trade-off. The problem is the failure mode: when the schema drifts, the consumer does not know. Under at-least-once delivery, where old messages can be replayed, this silent data loss compounds across every replayed message.
⚠️ Common Mistake: Assuming that [JsonRequired] or JsonUnmappedMemberHandling.Disallow on the POCO solves the schema drift problem. These shift the failure to deserialization time — better than silent corruption — but they still require every consumer to be updated simultaneously when the schema changes, and they provide no compatibility guarantees across message versions already in the log.
Serialization Format vs. Schema Registry: Two Separate Decisions
A common conflation is treating the wire format and the registry as a single choice. They are separable.
Wire format (Avro, Protobuf, JSON Schema) determines how bytes are laid out on the wire, what native types are supported, and how evolution is expressed structurally.
Schema registry determines whether compatibility rules are enforced, how schemas are versioned, and how consumers resolve the schema for a given message at runtime.
You can use JSON Schema as your wire format while still registering schemas and enforcing compatibility through a registry. Conversely, you can use Avro's binary encoding without a registry — though you lose the runtime schema resolution that makes independent producer/consumer deployment safe.
📋 Quick Reference Card: Wire Format vs. Schema Registry
| Wire Format | Schema Registry | |
|---|---|---|
| Controls | Binary layout, type system, evolution syntax | Compatibility enforcement, schema versioning |
| Examples | Avro, Protobuf, JSON Schema | Confluent Schema Registry (and compatible) |
| Failure without it | N/A — you always have a format | Silent schema drift, consumers break on deploy |
| Required for Kafka? | Yes (bytes must be serialized somehow) | No, but at-least-once delivery strongly benefits |
🤔 Did you know? The Confluent Schema Registry wire format prepends a 5-byte magic header to each message payload — one magic byte (0x00) followed by a 4-byte schema ID. This means a consumer can fetch the exact schema version used to produce a specific message, enabling safe deserialization of messages produced weeks or months earlier. This is what makes backward-compatibility replay safe in practice, not just in theory.
The choice of format matters for .NET specifically because the code-generation story differs. Avro typically requires a build-time code generation step or a generic record approach. Protobuf has mature .NET tooling through protoc and the Grpc.Tools package. JSON Schema is the most familiar but sacrifices the binary efficiency and stricter typing of the other two. None of these choices is universally correct — the registry enforces compatibility rules regardless of which you pick.
Common Misconfiguration Patterns and How to Catch Them
Configuration mistakes with Kafka tend to fail silently or fail late. Unlike a compilation error, a misconfigured producer will often start successfully, process thousands of messages, and only reveal the problem under specific runtime conditions — a rebalance, a first deployment, a topic with compaction enabled. The five patterns below are worth examining because each one looks reasonable at a glance and only reveals its flaw when you know what to look for.
Mistake 1: Idempotence + Too Many In-Flight Requests
When you set EnableIdempotence = true on a Confluent .NET producer, the client enforces a hard limit: MaxInFlight must not exceed 5. This preserves message ordering during retries — with more than five in-flight requests, a retry on a failed batch could overtake already-acknowledged batches, corrupting the partition's sequence.
// ⚠️ This configuration will throw at runtime, not at compile time
var config = new ProducerConfig
{
BootstrapServers = "localhost:9092",
EnableIdempotence = true,
MaxInFlight = 10 // throws: KafkaException — incompatible with idempotence
};
using var producer = new ProducerBuilder<string, string>(config).Build();
// Exception raised here, during producer construction
The failure happens during ProducerBuilder.Build(). In a hosted service this surfaces as a startup exception — traceable, but confusing if you don't already know the constraint exists. The corrected form is to let EnableIdempotence control the in-flight limit, or set MaxInFlight explicitly to any value ≤ 5.
The underlying history matters: documentation examples written before the idempotence defaults stabilized often showed higher in-flight values as performance tuning advice. If your team's producer configuration originated from copied snippets of uncertain age, this is worth auditing directly.
Mistake 2: The Fire-and-Forget Offset Commit
This is the most operationally dangerous pattern in this section because it produces at-most-once delivery semantics while every surface indicator suggests at-least-once. The code looks like responsible async consumer code. It compiles. It passes basic testing. Under normal operation it even works.
// ⚠️ This is at-most-once, not at-least-once
while (!cancellationToken.IsCancellationRequested)
{
var result = consumer.Consume(cancellationToken);
_ = ProcessMessageAsync(result.Message); // NOT awaited
// StoreOffset runs immediately, before ProcessMessageAsync completes
consumer.StoreOffset(result);
}
StoreOffset records the offset in the client's local store, and the next commit (manual or auto) will include it. Because ProcessMessageAsync is not awaited, the commit can occur before the handler has finished or even started meaningful work.
Timeline of the bug:
t=0 Consume message at offset 47
t=1 Launch ProcessMessageAsync (unawaited)
t=2 StoreOffset(47) called immediately
t=3 Auto-commit fires: offset 48 is committed to broker
t=4 ProcessMessageAsync throws / process crashes
t=5 Consumer restarts, reads from offset 48
Message 47 is permanently skipped
The corrected pattern awaits the handler before storing the offset:
// ✅ Correct: offset is stored only after handler confirms success
while (!cancellationToken.IsCancellationRequested)
{
var result = consumer.Consume(cancellationToken);
try
{
await ProcessMessageAsync(result.Message);
consumer.StoreOffset(result);
}
catch (Exception ex)
{
_logger.LogError(ex, "Handler failed for offset {Offset}", result.Offset);
throw; // or route to DLQ before rethrowing
}
}
⚠️ Common Mistake: Code review tools and static analyzers will not catch this by default. The discard _ = someTask pattern is valid C# for intentional fire-and-forget scenarios. The Kafka consumer loop is one of the places where it causes silent data loss.
🎯 Key Principle: If your offset commit is not causally downstream of your handler completing successfully, you do not have at-least-once delivery — regardless of what your AutoOffsetReset or EnableAutoCommit settings say.
Mistake 3: auto.offset.reset=latest on a New Consumer Group
auto.offset.reset controls what a consumer does when it joins a partition with no committed offset on record. The latest setting means: start reading from the end of the log as of the moment the consumer first connects. For a consumer group that has never committed, that is every partition.
Topic: order-events (10,000 messages accumulated before deployment)
Consumer group: order-processor-v1 (first run, no committed offsets)
auto.offset.reset = latest
Result:
Partitions all reset to their current end offset
All 10,000 historical messages are silently skipped
Consumer begins reading only messages produced after it started
No error is logged. No exception is raised. A schema registry does not help here — the data was structurally valid, correctly serialized, and entirely reachable. The consumer chose not to read it.
latest is a reasonable default for a log-shipping or metrics consumer where historical data is irrelevant. It becomes a trap when copied into an event-processing service where the historical record carries business state. For services where processing historical messages on first run is required, set auto.offset.reset=earliest. For services where it genuinely should not matter, document the choice explicitly so the next developer does not assume latest was a default rather than a decision.
💡 Real-World Example: This pattern commonly surfaces after blue-green deployments where a new consumer group name is introduced for the green deployment. The green group has no committed offsets. With latest, the green group starts clean and silently misses messages produced while the deployment was in progress.
Mistake 4: Keys That Don't Carry Identity in Compacted Topics
Kafka log compaction retains only the most recent message for each unique key within a partition. This means the key is the identity of the entity being represented — not merely a routing label.
❌ Key used purely for partitioning:
Key: "order" <- all orders go to same partition, no per-entity identity
Value: { orderId: 123, status: "shipped" }
Result after compaction:
Only ONE message with key "order" is retained
All other order events are compacted away
✅ Key carries entity identity:
Key: "order-123" <- unique per order, compaction is meaningful
Value: { orderId: 123, status: "shipped" }
Result after compaction:
Latest state for each orderId is retained
The topic behaves as an order-state changelog
The schema design implication: your key schema must be stable and sufficiently granular for compaction semantics to hold. If you are using Avro or Protobuf with a schema registry, the key schema should be registered separately from the value schema. Changing the key schema after a compacted topic is in production requires careful coordination, because historical keys under the old schema will not be superseded by new keys under a changed schema that produces different bytes for the same logical identity.
🤔 Did you know? Kafka's compaction process runs asynchronously and does not compact the "head" of the log (recent, uncompacted segments). A key identity mistake in a new compacted topic will not be obvious until the log ages enough for compaction to run — often hours or days after deployment.
Mistake 5: Local POCO Deserialization Without Schema Registry Resolution
Deserializing Kafka messages into a locally defined C# class using raw bytes is the path of least resistance when starting out. It becomes a coupling mechanism at scale. The issue is not that local deserialization is wrong in isolation — it is what it eliminates: independent deployability.
When your consumer deserializes by matching bytes against a locally compiled POCO, the POCO definition and the message bytes must agree at the moment of deserialization. There is no intermediary that can negotiate compatibility between what the producer wrote and what the consumer expects.
Local POCO approach — tight coupling:
Producer (v1 schema) ──writes──▶ Kafka ──reads──▶ Consumer (must have v1 POCO)
Adding a field to the schema:
→ Must redeploy ALL consumers before updating ANY producers
→ Or: risk deserialization failure on unknown fields
Schema-registry approach — loose coupling:
Producer (v2 schema) ──writes──▶ Kafka ──reads──▶ Consumer (v1 reader schema)
│
Registry resolves
v1 ←→ v2 compatibility
Missing fields use defaults
With a registry-backed deserializer using backward or full compatibility rules, the consumer can be running the old reader schema when the producer starts writing the new writer schema. The registry confirms they are compatible, and the consumer reads the new field as a default value — no coordinated deployment required.
📋 Quick Reference Card: Local POCO vs. Registry-Resolved Deserialization
| Local POCO | Registry-Resolved | |
|---|---|---|
| Deployment coupling | High — must coordinate | Low — compatibility enforced by registry |
| Schema drift detection | Silent field drop | Caught at deserialization or registration |
| Evolution strategy | Manual | Backward / Forward / Full |
| Setup cost | Low | Moderate (registry infrastructure) |
| Best fit | Prototyping, internal tools | Production, multi-team consumers |
A practical first step toward registry resolution without a full migration is to begin registering your existing schemas as documentation. Even before switching deserializers, a registry entry establishes a compatibility baseline that makes future changes auditable.
Catching These in Practice
Each of these five patterns shares a common property: they are invisible until they matter. The idempotence conflict throws at startup — catch it in CI with a configuration validation step that constructs the producer in a test context. The fire-and-forget commit passes all unit tests and fails only under handler failure — a targeted code review rule or Roslyn analyzer flagging _ = someTask patterns inside consumer loop methods helps here. The offset reset issue is catchable by including first-run integration tests that seed a topic before starting the consumer group and asserting that historical messages were processed. The key-identity and POCO-coupling issues surface during schema review — most effective when schema registration is part of your PR process rather than a post-deployment activity.
Key Takeaways and What Comes Next
By this point you've moved from the surface-level question — "how do I get messages from A to B?" — to the deeper question that actually governs production system design: "what does it mean for a message to arrive, and what must be true about its shape for that arrival to matter?" Those two questions are not independent, and this section organizes what you now understand into a structure you can carry into real decisions.
The Decision Matrix
Every Kafka producer and consumer configuration you write reflects a position on two axes: delivery mode and schema strategy. Neither axis has a universally correct setting. What matters is that the two choices are made together, deliberately, with awareness of the behavioral consequences each combination produces.
📋 Quick Reference Card: Delivery Mode × Schema Strategy
| At-Most-Once | At-Least-Once | Effectively-Once | |
|---|---|---|---|
| Behavioral guarantee | Message may be lost on crash | Delivered one or more times | Observable effect happens exactly once per partition sequence |
| Producer config | acks=0 or acks=1 |
acks=all, retries enabled |
EnableIdempotence=true, TransactionalId set |
| Consumer config | Commit before processing | Commit after processing; handle duplicates | IsolationLevel=ReadCommitted |
| Schema requirement | Low — schema drift hurts less when loss is tolerable | Stable event ID field required for deduplication | Strict schema evolution rules; field identity critical |
| Typical use cases | Metrics aggregation, high-volume telemetry | Domain events, notifications, audit trails | Financial ledger entries, inventory adjustments, order state changes |
| Cost of getting it wrong | Occasional silent loss — usually acceptable | Duplicate processing without idempotent handler | Uncommitted reads if ReadCommitted is omitted |
Minimum Viable Configurations
The three code blocks below show the minimum viable configuration for each delivery mode using the Confluent .NET client. The child lessons cover the full transaction API and offset management patterns in depth; what matters here is seeing where each decision surfaces in actual configuration code.
At-Most-Once Producer
// At-most-once: fire and forget
// Acks.Leader means the leader acknowledges but does not wait for follower replication.
// Appropriate when throughput matters more than loss tolerance — e.g., high-volume metrics.
var config = new ProducerConfig
{
BootstrapServers = "localhost:9092",
Acks = Acks.Leader,
MessageSendMaxRetries = 0 // no retries — loss is acceptable, duplicates are not
};
using var producer = new ProducerBuilder<string, string>(config).Build();
producer.Produce("metrics-topic", new Message<string, string>
{
Key = "cpu-utilization",
Value = "{\"host\":\"web-01\",\"value\":0.73}"
});
At-Least-Once Consumer with Idempotent Handler
// At-least-once: commit after processing, handle duplicates via event ID.
// The schema for OrderPlacedEvent must include a stable EventId field.
var config = new ConsumerConfig
{
BootstrapServers = "localhost:9092",
GroupId = "order-processor",
AutoOffsetReset = AutoOffsetReset.Earliest,
EnableAutoCommit = false
};
using var consumer = new ConsumerBuilder<string, OrderPlacedEvent>(config)
.SetValueDeserializer(schemaRegistryDeserializer) // registry-backed — not raw JSON
.Build();
consumer.Subscribe("orders-topic");
while (!cancellationToken.IsCancellationRequested)
{
var result = consumer.Consume(cancellationToken);
if (!await idempotencyStore.IsAlreadyProcessed(result.Message.Value.EventId))
{
await orderService.Handle(result.Message.Value);
await idempotencyStore.MarkProcessed(result.Message.Value.EventId);
}
consumer.Commit(result);
}
The schemaRegistryDeserializer is not a detail to defer — it is what makes EventId reliable across schema versions. If the field is renamed in a future schema version, the registry's compatibility rules catch the incompatibility before the consumer sees a redelivered message with a null ID.
Effectively-Once: Transactional Producer Sketch
// Effectively-once: transactional producer with a unique TransactionalId.
// EnableIdempotence is automatically set to true by TransactionalId.
var config = new ProducerConfig
{
BootstrapServers = "localhost:9092",
TransactionalId = "inventory-adjustment-producer-01"
};
using var producer = new ProducerBuilder<string, InventoryAdjustedEvent>(config)
.SetValueSerializer(schemaRegistrySerializer)
.Build();
producer.InitTransactions(timeout: TimeSpan.FromSeconds(30));
try
{
producer.BeginTransaction();
await producer.ProduceAsync("inventory-events", new Message<string, InventoryAdjustedEvent>
{
Key = adjustment.SkuId,
Value = adjustment
});
// In a read-process-write pattern, SendOffsetsToTransaction goes here
// to atomically commit the consumed offset alongside the produced message.
// That pattern is covered fully in the Delivery Semantics child lesson.
producer.CommitTransaction();
}
catch (KafkaException)
{
producer.AbortTransaction();
throw;
}
Summary
Delivery mode is a configuration decision with behavioral consequences. Setting EnableAutoCommit=true and calling your handler asynchronously without awaiting it isn't a performance optimization — it's a silent adoption of at-most-once semantics inside an at-least-once configuration.
Schema strategy is what makes delivery guarantees load-bearing. A guarantee that a message arrives is only as useful as the guarantee that it means the same thing it meant when it was produced. A schema registry with compatibility rules is the enforcement point for that second guarantee.
The three delivery modes impose different schema requirements, not just different Kafka configurations. At-most-once tolerates schema drift because lost messages reduce the blast radius of misinterpretation. At-least-once demands stable identity fields because duplicates must be detectable. Effectively-once demands strict field evolution rules because transactional correctness depends on consumers interpreting committed data consistently across deployments.
The decision matrix is a checklist, not a preference. If the business consequence of duplication is a double charge or an incorrect inventory count, effectively-once is a correctness requirement. If the business consequence of loss is a missing data point in a dashboard that refreshes every second, at-most-once is appropriately calibrated.
⚠️ The single most common error in applying this material: choosing at-least-once delivery without designing the schema to support it. At-least-once without a stable event ID field and an idempotency store is at-least-once in configuration and at-most-once in correctness — you're accepting the overhead of redelivery without capturing its safety benefit.
🧠 Mnemonic — DID: Delivery mode, Identity field in schema, Duplication handling in consumer. If all three align, your at-least-once system is actually at-least-once.
Where to Go Next
Child Lesson: Delivery Semantics covers the mechanics this lesson gestured at: offset management in detail (manual commit patterns, StoreOffset vs. Commit, the EnableAutoOffsetStore interaction), the full transaction API including SendOffsetsToTransaction for read-process-write loops, and consumer group rebalancing — specifically, how a rebalance mid-transaction can leave offsets in an inconsistent state and what the Confluent .NET client exposes to handle it.
Child Lesson: Schema & Serialization covers the registry side: standing up a Confluent Schema Registry instance, registering Avro and Protobuf schemas from .NET, configuring backward and forward compatibility rules, and wiring IAsyncSerializer<T> / IAsyncDeserializer<T> into your producer and consumer builders. It also covers the practical question of which wire format to choose and why that choice is independent of the registry itself.