Consumer Groups & Offsets
One partition is consumed by exactly one group member at a time. Offsets are your real checkpoints — not message acknowledgements.
Why Consumer Groups Exist: Scaling Reads Beyond a Single Consumer
Imagine an order-processing service reading from a Kafka topic that receives a steady stream of new orders. At launch, one consumer instance handles the load comfortably. Then a promotion goes live, order volume triples, and the same consumer starts falling further behind with every poll — the backlog grows, downstream fulfillment slows, and nobody touched the code. The topic didn't change shape; the read side simply couldn't keep pace. Why does adding more consumers fix this when the data lives in one place? And why does Kafka split the work by partition instead of handing out individual messages the way a traditional task queue would? Those two questions are what this section answers, by introducing the consumer group — the mechanism Kafka uses to let multiple consumers cooperatively read a topic without duplicating or losing work.
The Single-Consumer Bottleneck
A Kafka topic is not one continuous stream — it is split into partitions, each an independently ordered, append-only log. A single consumer instance that subscribes to a topic will, by default, be handed every partition and must read them all itself. Internally, the client still processes work sequentially relative to what your application code can handle: it polls for records, your handler runs, and only then does the loop ask for more. If message volume outpaces how fast that loop can process each record, the consumer's read position falls further and further behind the producer's write position. That gap is often called consumer lag, and it is the direct symptom of a throughput mismatch between one reader and a topic that's being written to faster than it's being drained.
The order-processing example makes this concrete. Suppose the orders topic has eight partitions and, during a peak sales event, producers write far more orders per second than the single consumer instance can validate, persist, and forward to a fulfillment queue. Vertical scaling — giving that one process more CPU or memory — helps only if the bottleneck is raw compute, and even then it hits a ceiling: a single OS thread pumping through a poll loop can only do so much sequential work per second. The structural fix is to have more than one consumer instance sharing the workload, each responsible for a subset of the partitions. That's the problem consumer groups exist to solve.
Defining the Consumer Group
A consumer group is a named set of consumer instances that coordinate to divide the partitions of one or more topics among themselves, so that the group as a whole reads the full topic while each member reads only a slice of it. Instances declare their membership by configuring the same GroupId; Kafka uses that shared identifier to know which consumers should split ownership together.
🎯 Key Principle: A consumer group parallelizes reads by dividing partitions, not by dividing individual messages. This is the single most important structural fact to internalize before anything else about groups makes sense.
That distinction matters because it's easy to picture a group the way you'd picture a thread pool pulling tasks off a shared queue — any idle worker grabs the next available item. Kafka consumer groups don't work that way. Ownership is assigned at the partition level: a given partition is handed to exactly one consumer instance in the group, and that instance reads every message in that partition in order, from the first message to the last, for as long as it holds that assignment. The exact rule governing this assignment, and what happens when the number of consumers doesn't match the number of partitions, is the focus of the next section, "The Partition Ownership Rule: One Partition, One Consumer" — this section only needs you to understand that partitions, not messages, are the unit that gets distributed.
💡 Mental Model: Think of a topic's partitions as a stack of ordered logbooks, and a consumer group as a small team assigned to read the whole stack. Instead of the team members fighting over which page to read next, each person is handed entire logbooks to read cover to cover. Scaling the team means handing out logbooks differently, not handing out pages.
Scaling the Order-Processing Example
Returning to the order-processing scenario: with eight partitions on the orders topic, a single consumer instance owns all eight. If that team places three more instances into the same consumer group (same GroupId, subscribed to the same topic), Kafka's group coordination redistributes the eight partitions across the four instances — for example, two partitions per instance. Each instance now only has to keep pace with a quarter of the traffic that a single reader previously had to absorb on its own, which is why adding instances to a group is the standard scaling lever for topic-level read throughput.
Here is a minimal illustration of that configuration in .NET using the Confluent.Kafka client. Building a full poll loop is the subject of "Building a Minimal Consumer with Confluent.Kafka" later in this lesson; this snippet only shows the piece relevant here — the shared GroupId that makes multiple instances cooperate as one group:
using Confluent.Kafka;
// Every instance of the order-processing service uses this same GroupId.
// That shared value is what tells Kafka these instances belong to one group
// and should split partition ownership rather than each reading everything.
var config = new ConsumerConfig
{
BootstrapServers = "localhost:9092",
GroupId = "order-processing-service",
AutoOffsetReset = AutoOffsetReset.Earliest
};
using var consumer = new ConsumerBuilder<string, string>(config).Build();
consumer.Subscribe("orders");
// Each running instance of this same code, pointed at the same GroupId,
// receives a subset of the "orders" topic's partitions.
If you deploy four copies of that process, you have a four-member consumer group. Kafka's group coordination (detailed in the forthcoming Consumer Group Mechanics lesson) assigns partitions across those four members automatically — your application code doesn't choose which partitions it gets. What matters here is the shape of the outcome: eight partitions split roughly evenly across four readers rather than one reader draining all eight serially.
🤔 Did you know? A consumer group's GroupId is just a string, and Kafka treats every distinct string as a separate, independent group. Two services can subscribe to the exact same topic under different GroupId values, and each group will independently receive a full copy of every message on that topic — the groups don't compete with or affect each other at all. That property is what lets, say, an order-processing group and a fraud-detection-analytics group both read the same orders topic in full, each at their own pace, without interfering with one another.
What This Lesson Owns, and What Comes Next
It's worth being explicit about the boundary this lesson draws, since "consumer groups" is a big enough topic that a full treatment spans several dedicated lessons. This lesson establishes the mental model you need before touching any of the deeper mechanics: the partition-ownership rule that governs how work is divided (next section), and the distinction between an offset as a checkpoint versus a per-message acknowledgement (the section after that), followed by a minimal working .NET consumer and the mistakes that commonly break it.
What this lesson does not cover in depth — because each has enough nuance to deserve its own space — is the internal group protocol that Kafka's broker-side coordinator runs to actually assign partitions to members, the mechanics of where and how committed offsets are stored, and the triggers and consequences of a rebalance (the process of reassigning partitions when group membership changes). Those are the subjects of the Consumer Group Mechanics, Offset Management, and Rebalancing & Backpressure lessons that build on the foundation laid here. Trying to absorb all of that alongside the basic ownership model at once tends to blur the very distinction that makes consumer groups tractable to reason about: first understand what gets divided and why it scales reads, then learn how Kafka performs that division under the hood.
💡 Real-World Example: A payment-notification service reading from a single-partition topic will not benefit at all from adding a second consumer instance to its group — with only one partition, one instance holds it and the second sits idle. This is a preview of the ownership rule the next section covers in full, but it's worth flagging now: scaling a consumer group only helps up to the number of partitions the topic has, so partition count is a capacity decision made well before you write any consumer code.
Why Partitions, Not Messages
It's worth pausing on why Kafka chose partition-level division instead of message-level distribution, since the alternative feels more intuitive if you've worked with traditional message queues where any free worker can grab the next message off a shared queue. A partition is an ordered log, and Kafka guarantees ordering only within a partition, never across partitions. If individual messages were handed out to whichever consumer happened to be free, two messages originally written next to each other in the same partition could be processed by two different instances in an unpredictable order — one might finish before the other started. For many use cases that's tolerable, but for anything where sequence matters within a logical unit (all events for one order ID, for instance, if they're keyed to the same partition), losing that ordering guarantee would silently change your application's correctness. By assigning whole partitions to single consumers, Kafka preserves in-partition ordering as a hard guarantee even while parallelizing across the group.
⚠️ Common Mistake: Assuming that adding consumer instances always increases throughput linearly. Mistake: a team doubles their order-processing fleet from four instances to eight, expecting roughly double the read throughput, but the orders topic only has eight partitions — going from four to eight instances does help (each now owns one partition instead of two), but going further would leave the extra instances with nothing assigned. The ceiling is set by partition count, a trade-off examined in depth in "The Partition Ownership Rule: One Partition, One Consumer."
This section's job was narrow: show why a single consumer stalls against real throughput, define the consumer group as a named set of consumers splitting partition ownership, and make clear that the thing being split is partitions rather than messages. Holding onto that mental picture — a fixed number of ordered logs handed out whole to group members — is what makes every subsequent detail about assignment rules, rebalances, and offset commits click into place instead of feeling like arbitrary broker behavior.
The Partition Ownership Rule: One Partition, One Consumer
Once a group exists, something has to decide who reads what. Kafka's answer is a single, unbending rule that shapes every capacity decision you'll make around a consumer group.
The Core Rule
🎯 Key Principle: Within a single consumer group, each partition is assigned to exactly one consumer instance at a time. Two members of the same group will never be handed the same partition simultaneously.
This is an ownership assignment, not a load-balancing suggestion. If topic orders has 6 partitions and your group order-processor has 3 running instances, Kafka's group coordinator hands out those 6 partitions across the 3 consumers — commonly 2 apiece, though the exact split depends on the assignment strategy in use. What never happens is two instances both pulling from partition 3 at the same time. That guarantee is what lets you reason about ordering: since a single partition is an ordered log and only one consumer ever reads it at once, message order within that partition is preserved from the consumer's point of view. Break that guarantee and you'd lose the one thing partitions promise you.
Here's the assignment for that example, visualized:
Topic: orders (6 partitions)
Group: order-processor (3 consumer instances)
Consumer A ← partition 0
Consumer A ← partition 1
Consumer B ← partition 2
Consumer B ← partition 3
Consumer C ← partition 4
Consumer C ← partition 5
Each arrow is an exclusive ownership link for as long as the group's membership stays stable. Exactly how the coordinator decides which consumer gets which partitions, and what happens the moment membership changes, is the internal protocol covered in "Consumer Group Mechanics" — for this section, treat the assignment as a given and focus on what it implies for scaling.
Over-Provisioning: Idle Consumers
The rule cuts both ways, and the first consequence surprises teams who assume that adding consumer instances always adds throughput. If you scale order-processor from 3 instances to 8, but the topic still has only 6 partitions, two of those 8 consumers get zero partitions assigned. They're fully connected, fully healthy, sitting in the group — and doing nothing.
Topic: orders (6 partitions)
Group: order-processor (8 consumer instances)
Consumer A ← partition 0
Consumer B ← partition 1
Consumer C ← partition 2
Consumer D ← partition 3
Consumer E ← partition 4
Consumer F ← partition 5
Consumer G ← (idle, no partitions)
Consumer H ← (idle, no partitions)
⚠️ Common Mistake: Assuming more pods means more throughput. A team running an order-processing service behind a Kubernetes horizontal pod autoscaler might scale from 6 replicas to 12 under load, expecting consumption to speed up proportionally. If the topic has 6 partitions, the extra 6 replicas just idle in the group, burning compute and connection overhead while contributing nothing to throughput. The fix isn't more consumers — it's more partitions, which is a topic-level change with its own tradeoffs (rebalancing cost, ordering scope) that fall outside this lesson's ownership model.
Those idle consumers aren't harmless bystanders, either — they still participate in group membership, which means they still send heartbeats and still factor into rebalances when they join or leave. You're paying coordination overhead for zero consumption capacity.
Under-Provisioning: One Consumer, Many Partitions
The opposite imbalance is more forgiving. If you run order-processor with only 2 instances against those same 6 partitions, each consumer simply owns more than one: Consumer A might get partitions 0, 1, and 2, while Consumer B gets 3, 4, and 5.
A single consumer instance handling multiple partitions doesn't need multiple threads or any special code to do it — the poll loop handles it transparently. Each call to Consume() (or ConsumeAsync in some client wrappers) can return a message from any of the partitions that consumer owns, interleaved in whatever order they arrive. Building that poll loop in detail is the job of "Building a Minimal Consumer with Confluent.Kafka"; what matters here is the ownership shape it operates under.
// A single consumer instance can own several partitions at once.
// The poll loop doesn't change — Consume() transparently returns
// messages from whichever owned partition has data ready.
while (!cancellationToken.IsCancellationRequested)
{
var result = consumer.Consume(cancellationToken);
// result.Partition tells you which of this consumer's
// owned partitions the message came from.
Console.WriteLine(
$"Partition {result.Partition.Value}, " +
$"Offset {result.Offset.Value}: {result.Message.Value}");
}
The tradeoff is throughput per partition, not correctness: Consumer A is now serially working through three partitions' worth of traffic instead of one. If message handling is CPU- or I/O-bound and slow, a consumer owning three partitions processes each one more slowly in wall-clock terms than it would if it owned just one, simply because it's dividing its attention. This is a real capacity constraint, but it's a controlled degradation — nothing is lost or duplicated, throughput just narrows.
Groups Are Independent: Full-Topic Fan-Out
So far every example has stayed inside one group, where partitions are divided up so each message is handled once. That's easy to confuse with a traditional competing-consumers queue pattern, where any worker can dequeue any message and the pool competes for a shared backlog. Kafka consumer groups look similar at first glance but behave differently once you add a second group.
❌ Wrong thinking: "If I add a second consumer group reading the same topic, the two groups will split the partitions between them, so each group processes half the messages."
✅ Correct thinking: Each consumer group maintains its own independent set of partition assignments and its own committed offsets. A second group reading orders doesn't compete with the first group for partitions at all — it gets its own full copy of the ownership assignment across its own members, and reads every message in the topic from the beginning of its own offset position.
Concretely: if order-processor (3 instances) and fraud-detector (2 instances) both subscribe to the 6-partition orders topic, you get two completely separate ownership maps:
Group: order-processor (3 instances) — reads all 6 partitions
Consumer A ← partitions 0, 1
Consumer B ← partitions 2, 3
Consumer C ← partitions 4, 5
Group: fraud-detector (2 instances) — reads all 6 partitions, independently
Consumer X ← partitions 0, 1, 2
Consumer Y ← partitions 3, 4, 5
Every message published to orders is delivered once to order-processor's collective assignment and once, separately, to fraud-detector's. Neither group knows the other exists. This is what makes Kafka useful for fan-out patterns like feeding both an order-fulfillment pipeline and a fraud-analysis pipeline from the same event stream, without duplicating the topic or the producer.
💡 Mental Model: Think of a topic as a broadcast, and each consumer group as an independent tuned-in receiver. Within a receiver (a group), the six partitions of programming are split among however many radios (consumers) that receiver owns — no two radios in the same receiver play the same channel. But a second receiver across the room hears the entire broadcast too, on its own set of radios, completely unaware of the first.
⚠️ Common Mistake: Mistaking group isolation for load-sharing across groups. It's tempting to assume that adding a second group to a busy topic will relieve load on the first — it won't, because the two groups don't share offsets, assignments, or a partition pool at all. This is different from a mistyped GroupId accidentally splitting one intended group into several unintentional full-topic readers, which is a specific implementation pitfall covered in "Common Pitfalls When Consuming with Confluent.Kafka."
The Capacity-Planning Trade-off
Put the over- and under-provisioning consequences together and you get the practical rule for capacity planning: partition count sets the hard ceiling on parallelism achievable within a single consumer group. No matter how many instances you deploy, a group can never have more actively working consumers than the topic has partitions — the arithmetic simply doesn't allow it.
This has a direct implication for how you design topics up front. If you anticipate that orders might eventually need 20 parallel consumer instances to keep up with peak load, but you provision the topic with only 4 partitions, you've capped your own future scaling regardless of how much compute you're willing to throw at it. Increasing partition count later is possible on most Kafka deployments, but it's not a live, transparent operation — it can shift the mapping of keys to partitions, which affects ordering guarantees for anything relying on key-based partition affinity, and it doesn't retroactively redistribute already-written data. That's a topic-design concern that sits upstream of consumer group behavior, but it's worth internalizing here because it's this ownership rule that makes the ceiling real.
| 🔧 Scenario | 🎯 Partition:Consumer Ratio | 📚 Outcome |
|---|---|---|
| 🔒 Balanced | 6 partitions, 6 consumers | Each consumer owns exactly 1 partition — maximum parallelism reached |
| 🔒 Under-provisioned | 6 partitions, 2 consumers | Each consumer owns 3 partitions — correctness preserved, throughput narrowed |
| 🔒 Over-provisioned | 6 partitions, 8 consumers | 2 consumers sit idle — wasted capacity, no throughput gain |
💡 Pro Tip: When sizing a consumer group, treat partition count as the true capacity number and consumer instance count as a variable you tune up to that ceiling — not past it. A reasonable starting heuristic is to provision enough partitions for the parallelism you expect at peak, then scale consumer instances up or down within that range as load changes, rather than repeatedly resizing the topic itself.
Offsets Are Checkpoints, Not Per-Message Acknowledgements
It's tempting to picture a Kafka consumer working like a message queue with acknowledgements: fetch a message, process it, "ack" it, move to the next one. That mental model is wrong in a way that has real consequences the first time your consumer crashes mid-batch. Kafka doesn't track the status of individual messages at all. It tracks a single number per partition, per consumer group, and understanding exactly what that number means — and when it moves — is the difference between predicting your system's failure behavior and being surprised by it.
Two Different Numbers Called "Offset"
The word "offset" gets used for two related but distinct things, and conflating them is where confusion starts.
The log offset is a message's permanent address inside a partition. Every record appended to a partition gets the next integer in sequence — 0, 1, 2, 3, and so on. This number is immutable and belongs to the message itself; it never changes no matter which consumer reads it or how many times.
The committed offset belongs not to a message but to a consumer group. It's a single integer, stored per partition, that answers one question: "where should this group resume reading if it starts fresh right now?" A committed offset of 47 means "this group has finished with everything up through offset 46; hand it offset 47 next."
Partition 0 log:
offset: 0 1 2 3 4 5 6 7
message: [A] [B] [C] [D] [E] [F] [G] [H]
^
committed offset = 5
(group resumes at message F)
The critical thing to notice: the committed offset is a bookmark, not a ledger of which individual messages succeeded or failed. There's no record anywhere that says "message D was processed correctly, message C threw an exception." There's only the bookmark, sitting at position 5, silently implying that everything before it — A, B, C, D — is considered done, whether or not that's actually true.
🎯 Key Principle: One Bookmark, Not Many Receipts
A traditional message queue often gives you per-message acknowledgement: you ack message 17 specifically, independent of whether message 16 was acked. Kafka's commit model doesn't work that way. Committing an offset advances one shared checkpoint, and that checkpoint only moves forward. You cannot commit "message 16 succeeded, message 17 failed" — you can only say "advance the bookmark to 18," which implicitly claims 16 and 17 both succeeded, or you can not advance it at all, which implicitly claims nothing past the last checkpoint succeeded.
This matters because a batch of messages processed by a single Consume() loop rarely fails atomically. Suppose your consumer pulls ten messages, successfully writes eight of them to a database, and then throws an exception on the ninth. There is no Kafka mechanism to say "commit the first eight, retry the ninth." Your code has to decide, at the granularity of whatever offset value it chooses to commit, where the bookmark goes — and Kafka will trust that decision completely. It doesn't inspect your business logic or verify your database writes; it just stores the number you gave it.
💡 Mental Model: Think of the committed offset like a bookmark in a paperback novel, not like a set of highlighted sentences. You can't mark "I definitely read page 40 but skipped page 41." You place the bookmark after the last page you're confident you finished, and next time you open the book, you start from there — rereading anything you'd actually gotten through past that point if your memory of "where you stopped" was imprecise.
Why Commit Timing Is the Whole Game
Because the commit is just a number and Kafka enforces no relationship between that number and actual processing outcomes, when your code chooses to advance it determines your delivery guarantees.
If you commit the offset before processing the message, you're telling Kafka "I'm done with this" before you actually know that. If the consumer then crashes while doing the real work — writing to a database, calling an API, updating a cache — that message is never processed, but on restart the consumer resumes from the committed offset, which already skipped past it. The message is silently lost from this consumer group's perspective, even though it's still sitting untouched in the log. This produces at-most-once delivery: each message is delivered zero or one times, never more, but occasionally zero.
If you commit the offset after processing completes, you avoid that loss, but you introduce the opposite risk. If the consumer crashes after finishing the work but before the commit call completes — or before the interval-based auto-commit fires — the next consumer to take that partition will resume from the older, still-uncommitted offset and reprocess messages that were already handled. This produces at-least-once delivery: every message is delivered one or more times, never zero, but occasionally more than once.
Commit BEFORE processing:
commit(offset) → process(message) → [crash here] → message lost
Commit AFTER processing:
process(message) → [crash here] → commit(offset) never runs → message reprocessed on restart
There is no timing choice that eliminates both risks simultaneously using offsets alone — that would require the processing step and the commit to be a single atomic operation spanning two independent systems (your business logic's side effects and Kafka's offset store), which is why genuinely exactly-once semantics require additional machinery layered on top rather than clever commit placement. The commit APIs, auto-commit interval configuration, and where committed offsets are physically stored are covered in the dedicated Offset Management lesson — what matters here is the shape of the trade-off, not the syntax for controlling it.
🤔 Did you know? Kafka Defaults to At-Least-Once
Out of the box, most consumer configurations commit after processing (or on an interval that lands after messages have already been handed to your code), which biases the default toward at-least-once rather than at-most-once. This is a deliberate design choice: losing data silently is usually worse than occasionally reprocessing it, so Kafka's ergonomics nudge you toward duplication risk over loss risk unless you explicitly configure otherwise.
Worked Scenario: Crash Mid-Batch
Walk through a concrete case to see the mechanics land. A consumer in group order-processing is assigned partition 0, currently sitting at committed offset 100. It calls Consume() in a loop and pulls a batch of five messages: offsets 100 through 104. Its processing pattern is "handle the message, then commit" — the at-least-once pattern described above, applied one message at a time for simplicity.
Starting state: committed offset = 100
Step 1: process offset 100 → success → commit(101) [committed = 101]
Step 2: process offset 101 → success → commit(102) [committed = 102]
Step 3: process offset 102 → success → CRASH before commit(103) runs
The consumer process dies right after finishing the work for offset 102 — say it wrote a row to a database — but before the commit call for offset 103 reaches the broker. The committed offset for this partition is frozen at 102, even though offset 102's message was fully handled.
When the consumer (or a replacement instance in the same group) restarts and rejoins, it asks the broker for the last committed offset on partition 0, gets back 102, and resumes fetching from offset 102 onward. It will process message 102 again — the very message it finished just before crashing — followed by 103 and 104, which it never got to the first time.
Restart: fetch resumes at committed offset = 102
Step 1 (again): process offset 102 → reprocessed (duplicate)
Step 2: process offset 103 → first time
Step 3: process offset 104 → first time
Notice what did not happen: Kafka did not lose any messages, and it did not need to know that offset 102 had "already succeeded." It just replayed everything from its last known bookmark forward. The duplicate processing of message 102 is the direct, mechanical consequence of the commit happening after processing but the crash landing in the gap between finishing the work and recording that fact. If this consumer's downstream effect — say, charging a payment or sending an email — isn't safe to run twice, this replay is where real damage happens, which is why idempotent processing logic is a standard companion to at-least-once consumption rather than an optional nicety.
⚠️ Common Mistake: Assuming "the message was consumed" means "the message was committed." A message can be fetched, handed to your code, fully processed, and even used to produce side effects — and still count as unconsumed from Kafka's perspective if the offset commit never happened. Consume() returning a message and the group's checkpoint advancing are two separate events, and the gap between them is exactly where crashes cause reprocessing.
Seeing the Two Offsets in Code
The distinction between a message's own offset and the group's committed offset is visible directly in the Confluent.Kafka API surface. Here's a minimal illustration — not the full poll loop, which belongs to "Building a Minimal Consumer with Confluent.Kafka" — focused just on where each number comes from:
// consumeResult.Offset is the LOG offset: this message's fixed position
// in the partition. It never changes and belongs to the message.
ConsumeResult<string, string> consumeResult = consumer.Consume(cancellationToken);
Console.WriteLine($"Log offset of this message: {consumeResult.Offset}");
// Process the message first...
ProcessOrder(consumeResult.Message.Value);
// ...then advance the GROUP's committed offset. This call updates the
// bookmark, not a per-message flag. Everything up to and including
// this message is now implicitly "done" as far as the group is concerned.
consumer.Commit(consumeResult);
And here's how you'd read back the group's current bookmark independent of any single message — useful for reasoning about where a restart will resume from:
// Ask the broker for this group's committed offset on a specific partition.
// This returns the checkpoint, not any information about individual
// messages that were or weren't successfully handled before it.
var topicPartition = new TopicPartition("orders", partition: 0);
var committedOffsets = consumer.Committed(
new[] { topicPartition },
timeout: TimeSpan.FromSeconds(5));
Console.WriteLine($"Group will resume at offset: {committedOffsets[0].Offset}");
Both snippets are simplified to isolate the offset concepts — real consumer code needs exception handling around Consume() and a clean shutdown path, which "Building a Minimal Consumer with Confluent.Kafka" covers in full.
❌ Wrong thinking: "I called Commit(), so Kafka now knows message X succeeded."
✅ Correct thinking: "I called Commit(), so the group's resume point moved forward. Kafka has no opinion on whether message X — or anything before it — actually succeeded."
Why This Distinction Outlasts Any Specific API
It's worth dwelling on why this bookmark model exists rather than per-message acknowledgement, because it shapes everything downstream in this lesson path. Tracking a single integer per partition is cheap: the broker doesn't need a growing table of acknowledgement flags for every message ever produced, and a consumer rejoining after a rebalance only needs to ask "where do I resume?" rather than reconcile a detailed status history. That efficiency is precisely why the trade-off exists — you gain a lightweight, scalable checkpoint mechanism, and you give up the ability to selectively acknowledge within a batch. Kafka's ordering-by-partition guarantee reinforces this: because messages within a partition are strictly ordered and a single consumer processes them in that order, a single "resume from here" number is sufficient to describe progress. That same property is what makes the crash-and-resume scenario above deterministic rather than ambiguous — the replacement consumer knows exactly which messages come after the committed offset, even though it can't know which of them were already handled by the crashed instance.
Building a Minimal Consumer with Confluent.Kafka
Everything discussed so far — partition ownership, offset semantics — is abstract until you write the code that actually joins a group and pulls messages off a partition. This section builds that code piece by piece, using the official Confluent.Kafka client for .NET. The goal isn't a production-ready service; it's a correct, minimal skeleton that you'll extend in later lessons on rebalancing, offset commits, and error handling.
Configuring the Consumer
Every consumer starts with a ConsumerConfig object, which is a strongly typed wrapper around the dozens of settings the underlying Kafka client library accepts. Three of them matter before you write a single line of consuming logic.
BootstrapServers is the initial contact point — one or more broker addresses the client uses to discover the rest of the cluster. It doesn't need to list every broker; the client learns the full cluster metadata (topics, partitions, leader assignments) after the first successful connection. Think of it as a phone number to dial, not the whole address book.
GroupId is the string that determines which consumer group this instance joins. This single field is what turns an isolated client into a member of a coordinated group — every consumer sharing the same GroupId against the same topic will have partitions divided among them following the one-partition-one-consumer rule from earlier. Get this value wrong — a typo, a copy-paste error, an environment-specific suffix left in by accident — and you silently create a second group that reads the entire topic independently rather than sharing it; that failure mode gets its own treatment in "Common Pitfalls When Consuming with Confluent.Kafka."
AutoOffsetReset controls what happens the very first time this group ID reads a given partition — when there is no committed offset yet for the group to resume from. Setting it to Earliest tells the consumer to start from the beginning of the partition's retained log; Latest tells it to start from whatever is produced from this moment forward, ignoring history. This setting only applies when no valid committed offset exists; it does not override an existing checkpoint. A new consumer group pointed at a topic that's been running for a while, with AutoOffsetReset left at its default of Latest, will silently skip every message already sitting in the log — a common surprise for anyone expecting to "replay from the start."
using Confluent.Kafka;
var config = new ConsumerConfig
{
BootstrapServers = "localhost:9092",
GroupId = "order-processing-group",
AutoOffsetReset = AutoOffsetReset.Earliest,
EnableAutoCommit = true // default behavior; commit mechanics covered in Offset Management
};
💡 Pro Tip: Treat GroupId as a deployment-time configuration value, not a hardcoded literal. Two services with slightly different GroupId strings pointed at the same topic will each receive their own full copy of the data — which is sometimes exactly what you want (two independent groups each reading everything) and sometimes a bug (one logical service accidentally split into two groups).
Constructing the Client and Subscribing
With configuration in hand, the client itself is built through ConsumerBuilder<TKey, TValue>, a generic builder that also lets you specify how message keys and values should be deserialized. For plain string payloads, the built-in StringDeserializer (referenced via Deserializers.Utf8) is often enough; typed or serialized payloads (Avro, JSON, Protobuf) plug into the same builder but are outside this section's scope.
using var consumer = new ConsumerBuilder<Ignore, string>(config)
.Build();
consumer.Subscribe("order-events");
Here the key type is Ignore, meaning the code doesn't care about message keys — only the value matters for this example. Calling Subscribe does not connect to a partition immediately; it registers the consumer's interest in the topic and hands control of partition assignment to the group coordination protocol running on the broker side. That protocol — how the group decides who gets which partition, and what happens when membership changes — is the subject of the dedicated Consumer Group Mechanics lesson; for this section, it's enough to know that Subscribe is a declaration of intent, and the actual assignment arrives asynchronously once you start calling Consume.
⚠️ Common Mistake: Calling Subscribe with a list of topics you don't actually intend to process together, or calling it repeatedly. Subscribe replaces the current subscription set entirely rather than adding to it — if you need multiple topics, pass them all in one call: consumer.Subscribe(new[] { "order-events", "order-events-retry" }).
The Poll Loop
Kafka consumers are pull-based: nothing is pushed to your process, so you must actively ask for the next message by calling Consume. This is the poll loop — a while loop that repeatedly calls consumer.Consume(cancellationToken), blocking until a message is available (or the loop is cancelled), and hands back a ConsumeResult<TKey, TValue> containing the message, its topic-partition, and its offset.
using var cts = new CancellationTokenSource();
Console.CancelKeyPress += (_, e) =>
{
e.Cancel = true; // prevent immediate process kill
cts.Cancel();
};
try
{
while (!cts.IsCancellationRequested)
{
try
{
var result = consumer.Consume(cts.Token);
Console.WriteLine(
$"Partition {result.Partition.Value}, " +
$"Offset {result.Offset.Value}: {result.Message.Value}");
// Business logic goes here. Keep it fast — a slow handler
// delays the next poll and can trigger a rebalance,
// covered in "Common Pitfalls When Consuming with Confluent.Kafka."
}
catch (ConsumeException ex)
{
Console.WriteLine($"Consume error: {ex.Error.Reason}");
// Log and continue — one malformed message shouldn't kill the loop.
}
}
}
finally
{
consumer.Close();
}
Walk through what this loop is actually doing. Each call to Consume returns one ConsumeResult at a time — there is no batch API surfaced here, even though internally the client fetches messages from the broker in batches for efficiency. The Offset.Value printed alongside each message is the log offset — its fixed position in the partition — and is distinct from any committed offset the group has recorded, a distinction the earlier section on offsets as checkpoints covers in depth. With EnableAutoCommit left at its default true, the client periodically advances the committed offset on your behalf in the background; the timing and risk trade-offs of that behavior belong to the Offset Management lesson, not here.
Poll loop lifecycle, one iteration:
consumer.Consume(token) blocks
↓
broker returns next message (or token cancelled)
↓
ConsumeResult returned: { Partition, Offset, Message }
↓
handler code runs on the message
↓
loop repeats
Handling ConsumeException Without Crashing
Notice that Consume itself is wrapped in an inner try/catch targeting ConsumeException, separate from the outer try/finally that guards shutdown. This split matters. ConsumeException is thrown by the client when it receives a message it cannot properly deserialize or when a lower-level protocol error occurs on that particular fetch — for example, a message whose value doesn't match the configured deserializer. If that exception escapes the loop entirely, the process terminates and every partition this consumer owned needs to be picked up by another group member (or sits unprocessed until this instance restarts) — a disruption for what might be a single corrupt record. Catching ConsumeException at the point of the Consume call keeps the loop alive so the next message on the next iteration is still attempted.
⚠️ Common Mistake: This inner catch block is deliberately narrow and visible — it logs the error and moves on to the next message. It is easy to widen this into a blanket catch (Exception) that also swallows business-logic failures inside your handler, silently discarding messages that legitimately needed attention (a dead-letter queue, an alert, a retry). That broader failure mode — catching too much and hiding poison messages — is covered as its own pitfall in "Common Pitfalls When Consuming with Confluent.Kafka"; the pattern shown here is intentionally scoped to just the exception the client itself raises during deserialization or protocol handling.
Shutting Down Cleanly
The last piece of the skeleton is graceful shutdown, and it matters more than it looks. A CancellationTokenSource is wired to the process's cancel signal (Console.CancelKeyPress in a console app, or a hosted service's stopping token in a longer-lived process), and cancelling that token causes the blocking Consume call to throw an OperationCanceledException, which exits the while loop naturally. The critical step happens in the finally block: calling consumer.Close().
Close() does two things that matter for the group as a whole. It commits any pending offsets one last time (subject to the auto-commit configuration), and — more importantly for the topics owned by this lesson — it sends an explicit "leaving group" notification to the broker-side group coordinator. That notification lets the coordinator immediately reassign this consumer's partitions to remaining group members. Skip Close() — say, by letting the process exit without a finally block, or by killing it forcefully — and the coordinator has no way to know the member is gone until its session timeout elapses, leaving those partitions unread for that entire window. That failure mode, and how to guard against it more robustly (including Dispose() semantics), is picked up again in "Common Pitfalls When Consuming with Confluent.Kafka"; for now, the pattern to internalize is simple: pair a cancellation token with a finally block that calls Close(), every time.
🎯 Key Principle: A poll loop is only "minimal" in terms of business logic — its structural requirements (bounded config, a cancellable loop, narrow exception handling around Consume, and a guaranteed Close()) are not optional extras layered on top later. Every consumer you build from here forward, no matter how much retry logic, deserialization, or batching gets added, sits on top of exactly this skeleton.
Putting the Pieces Together
Assembled end to end, the full flow looks like this: build a ConsumerConfig with the three startup essentials, construct the client via ConsumerBuilder, subscribe to the topic, run the poll loop with per-message exception handling, and guarantee Close() runs on the way out.
using Confluent.Kafka;
var config = new ConsumerConfig
{
BootstrapServers = "localhost:9092",
GroupId = "order-processing-group",
AutoOffsetReset = AutoOffsetReset.Earliest
};
using var consumer = new ConsumerBuilder<Ignore, string>(config).Build();
consumer.Subscribe("order-events");
using var cts = new CancellationTokenSource();
Console.CancelKeyPress += (_, e) => { e.Cancel = true; cts.Cancel(); };
try
{
while (!cts.IsCancellationRequested)
{
try
{
var result = consumer.Consume(cts.Token);
Console.WriteLine($"Offset {result.Offset.Value}: {result.Message.Value}");
}
catch (ConsumeException ex)
{
Console.WriteLine($"Consume error: {ex.Error.Reason}");
}
}
}
finally
{
consumer.Close();
}
This is deliberately incomplete as a production artifact — it has no dead-letter routing, no explicit commit strategy beyond the auto-commit default, and no protection against a slow handler starving the loop. (Those gaps are intentional simplifications for this section; each is addressed by name in the pitfalls and offset-management lessons that follow.) What it does give you is a correct scaffold: every consumer instance built this way will register with the group named by GroupId, receive a subset of the topic's partitions once the group's assignment protocol runs, and leave that group cleanly when shut down — which is exactly the behavior the ownership and offset rules from earlier in this lesson assume is in place.
Common Pitfalls When Consuming with Confluent.Kafka
The minimal consumer you just built works fine in a demo. It reads its while loop, prints messages, and shuts down cleanly when you hit Ctrl+C. Production traffic exposes a different set of failure modes than the ones covered by a try/catch around Consume(). Most of the damage in real Confluent.Kafka deployments comes from a handful of implementation habits that look harmless in isolation but corrupt consumer state, stall processing, or quietly drop data. This section walks through five of the most common ones and how to recognize them before they reach production.
Pitfall 1: Sharing One IConsumer Across Threads
It's tempting to treat IConsumer<TKey,TValue> like any other injectable service — register it once, hand it to multiple worker threads, and let them all call Consume() to parallelize processing. This breaks immediately, because the Confluent.Kafka client is not thread-safe. The underlying librdkafka handle maintains internal state — its assignment table, its poll cursor, its heartbeat scheduling — that assumes a single thread is driving it.
// ❌ Wrong thinking: "more threads calling Consume() means faster throughput"
var consumer = new ConsumerBuilder<string, string>(config).Build();
consumer.Subscribe("orders");
// Spinning up several threads against the SAME consumer instance
for (int i = 0; i < 4; i++)
{
Task.Run(() =>
{
while (!cts.IsCancellationRequested)
{
var result = consumer.Consume(cts.Token); // corrupts internal state
Process(result);
}
});
}
Running this doesn't throw a clean, obvious exception every time — that's what makes it dangerous. You'll see intermittent KafkaException failures, partitions that appear to stop delivering messages, or assignment state that no longer matches what the broker thinks this consumer owns. Because the corruption is timing-dependent, it often passes local testing and only surfaces under real load.
✅ Correct thinking: one IConsumer instance belongs to exactly one thread's poll loop. If you want to parallelize work across CPU cores, do it in one of two ways: run multiple consumer instances (each with its own IConsumer, all sharing the same GroupId so the group's rebalance protocol splits the partitions between them), or keep a single-threaded poll loop that hands off each ConsumeResult to a bounded worker pool for processing while the loop itself keeps polling.
// ✅ One consumer, one poll-loop thread — parallelism happens downstream
var channel = Channel.CreateBounded<ConsumeResult<string, string>>(capacity: 100);
// Single thread owns the consumer and only calls Consume() here
_ = Task.Run(async () =>
{
using var consumer = new ConsumerBuilder<string, string>(config).Build();
consumer.Subscribe("orders");
while (!cts.IsCancellationRequested)
{
var result = consumer.Consume(cts.Token);
await channel.Writer.WriteAsync(result, cts.Token);
}
});
// Separate worker tasks pull from the channel and do the actual processing
for (int i = 0; i < 4; i++)
{
_ = Task.Run(async () =>
{
await foreach (var result in channel.Reader.ReadAllAsync(cts.Token))
{
Process(result); // safe: consumer object itself is untouched
}
});
}
The rule to internalize: the IConsumer object itself is single-threaded property, but the messages it hands you are plain data and safe to fan out however you like once they're off the poll loop.
Pitfall 2: Long-Running Handlers Blocking the Poll Loop
Even with a correctly single-threaded consumer, it's easy to write a poll loop where each iteration does too much work before looping back to call Consume() again:
while (!cts.IsCancellationRequested)
{
var result = consumer.Consume(cts.Token);
await CallDownstreamApiAsync(result.Message.Value); // could take seconds
await SaveToDatabase(result.Message.Value);
}
The group coordination protocol expects the consumer to call back into the client at a reasonable cadence — that's how the broker knows the member is still alive. A handler that blocks on a slow downstream API or a large database write delays that next call, and the broker can conclude the consumer has died. The precise heartbeat and session-timeout mechanics that make this happen — and the tuning knobs like max.poll.interval.ms — are covered in depth in the Rebalancing & Backpressure lesson, but the practical takeaway here is: keep your poll loop's per-iteration work short, and if a message requires slow processing, hand it off to a background worker (as in the channel pattern above) rather than doing the slow work inline between Consume() calls.
⚠️ Common Mistake: teams often discover this only after processing volume grows — a handler that took milliseconds against test data takes seconds against a real downstream service, and what used to be a smoothly running consumer group starts rebalancing under load, right when it can least afford the disruption.
Pitfall 3: Skipping Close()/Dispose() on Shutdown
When a consumer process exits, there are two very different ways the group finds out. If the consumer calls consumer.Close() before exiting, it sends an explicit "I'm leaving" notification to the group coordinator, and the broker triggers a rebalance immediately, reassigning that consumer's partitions to the remaining members with minimal delay. If the process instead exits abruptly — crashes, gets killed, or simply never calls Close() — the broker has no way to know the consumer is gone until it stops seeing heartbeats, at which point it waits out the session timeout before declaring the member dead and rebalancing its partitions.
// ❌ No Close() — broker only notices via session timeout
var consumer = new ConsumerBuilder<string, string>(config).Build();
consumer.Subscribe("orders");
try
{
while (!cts.IsCancellationRequested)
{
var result = consumer.Consume(cts.Token);
Process(result);
}
}
catch (OperationCanceledException)
{
// process just exits here — partitions sit unclaimed until timeout
}
// ✅ Explicit Close() in a finally block
var consumer = new ConsumerBuilder<string, string>(config).Build();
consumer.Subscribe("orders");
try
{
while (!cts.IsCancellationRequested)
{
var result = consumer.Consume(cts.Token);
Process(result);
}
}
catch (OperationCanceledException)
{
// expected on graceful shutdown
}
finally
{
consumer.Close(); // leaves the group cleanly, triggers immediate rebalance
}
The practical cost of skipping Close() is a gap during which the partitions that consumer owned are read by no one — messages queue up unprocessed until the session timeout expires and the group reassigns them. In a rolling deployment where instances are restarted one at a time, this pattern repeated across every restart adds up to real, avoidable processing latency. Dispose() on the consumer will also release the underlying client handle, but it does not send the same explicit leave-group notification that Close() does, so for a controlled shutdown you want Close() called explicitly before disposal.
Pitfall 4: Mistyped or Copy-Pasted GroupId Values
The GroupId string is the only thing that tells the broker which consumers should share a topic's partitions. Because it's just a string in configuration, it's exactly the kind of value that gets copy-pasted between services, typo'd in an environment variable, or accidentally left as a placeholder from a tutorial.
// Service A
var configA = new ConsumerConfig
{
BootstrapServers = "broker:9092",
GroupId = "order-processor",
AutoOffsetReset = AutoOffsetReset.Earliest
};
// Service B — meant to join the same group, but has a trailing typo
var configB = new ConsumerConfig
{
BootstrapServers = "broker:9092",
GroupId = "order-processor ", // trailing space — a DIFFERENT group to the broker
AutoOffsetReset = AutoOffsetReset.Earliest
};
The broker treats "order-processor" and "order-processor " as two entirely distinct groups. Instead of splitting the topic's partitions between the two instances as intended, each one forms its own single-member group and reads every partition of the topic independently. There's no exception thrown anywhere — both consumers appear to be working, throughput even looks fine on each instance individually — but downstream you get every message processed twice, and if the workload was supposed to be split for capacity reasons, neither instance is actually relieved of any load.
💡 Pro Tip: because this failure produces no errors, the most reliable way to catch it is to check group membership directly rather than trust the config file — the Consumer Group Mechanics lesson covers the tooling for inspecting which consumers the broker actually considers part of a given group, which is the fastest way to confirm your instances are sharing partitions as intended.
Pitfall 5: Swallowing ConsumeException Broadly
The minimal consumer pattern wraps Consume() in a try/catch (ConsumeException) so a single malformed message doesn't crash the whole process. That's a reasonable baseline, but it's easy to take the defensive instinct too far and catch the exception only to log it and move on, with no record of which message caused it and no path for it to be dealt with later.
// ❌ Wrong thinking: "as long as we don't crash, we're fine"
while (!cts.IsCancellationRequested)
{
try
{
var result = consumer.Consume(cts.Token);
Process(result);
}
catch (ConsumeException e)
{
Console.WriteLine($"Error: {e.Error.Reason}"); // and then... nothing
}
}
The problem here isn't that the process survives — it's that the message causing the exception, a poison message (one that's malformed, corrupted, or otherwise fails to deserialize), simply vanishes from view. Depending on where in the flow the exception occurs, the consumer's offset may still advance past that message, meaning the group will never see it again; there is no automatic retry and no automatic record of what went wrong. Anyone debugging a missing order later has no trail to follow.
// ✅ Correct thinking: capture context and route to a dead-letter path
while (!cts.IsCancellationRequested)
{
try
{
var result = consumer.Consume(cts.Token);
Process(result);
}
catch (ConsumeException e)
{
// Preserve enough detail to diagnose and reprocess later
await deadLetterSink.SendAsync(new
{
e.Error.Reason,
Partition = e.ConsumerRecord?.Partition.Value,
Offset = e.ConsumerRecord?.Offset.Value,
Timestamp = DateTimeOffset.UtcNow
}, cts.Token);
}
}
(This is a simplified illustration — a production dead-letter path typically needs its own topic or durable store plus a way to correlate the failed record back to its original key and headers, but the core habit — capture and route, rather than log and discard — is what matters here.)
⚠️ Common Mistake: catching ConsumeException broadly and treating every occurrence the same way. A deserialization failure on one malformed record is a very different situation from a broker connectivity error that will resolve itself — collapsing both into a single "log and continue" branch means transient infrastructure issues get silently absorbed right alongside permanently bad data, and neither gets the response it actually needs.
Why These Pitfalls Compound
None of these five mistakes are exotic — each is a small, easy-to-make decision in otherwise ordinary consumer code. What makes them worth studying together is that they tend to fail silently rather than loudly: a shared IConsumer doesn't throw "you violated thread safety"; a mistyped GroupId doesn't refuse to start; a swallowed ConsumeException doesn't page anyone. Each one erodes a guarantee the group protocol is supposed to provide — exclusive partition ownership, timely rebalancing, or a durable record of every message — without announcing that it's doing so. Treat the diagnosis for all of them the same way: verify actual group membership and partition assignment against what you intend, rather than trusting that the configuration you wrote matches the behavior you're getting.
Symptom you observe Likely pitfall to check first
─────────────────────────────────────────────────────────
Duplicate processing GroupId mismatch (Pitfall 4)
Unexplained rebalances Slow handler in poll loop (Pitfall 2)
Corrupted/erratic state Shared IConsumer across threads (Pitfall 1)
Delayed reassignment on Missing Close()/Dispose() (Pitfall 3)
deploy or restart
Messages disappearing Broad catch on ConsumeException (Pitfall 5)
with no trace
Keeping this mapping in mind turns a vague "the consumer is acting weird" investigation into a targeted check of one specific piece of code, which is usually where the actual fix ends up living.
Key Takeaways: Consumer Groups and Offsets
You've now walked through why consumer groups exist, the ownership rule that governs how they divide work, the difference between a message offset and a committed offset, and a working Confluent.Kafka consumer loop. Before moving into the mechanics that explain how Kafka enforces all of this behind the scenes, it's worth consolidating the mental model into something you can check your own code against. That's what this section does — it's a review pass, not new material, so treat it as a checklist rather than a fresh lesson.
Recap: Ownership Caps Parallelism
The rule that shapes everything else in this lesson is simple to state: within one consumer group, a partition is assigned to exactly one consumer instance at a time. Kafka never splits a single partition's messages across two members of the same group, and it never lets two members read the same partition concurrently. That's what makes consumer groups different from a generic fan-out queue — the group divides partitions, not individual messages.
The practical consequence is a hard ceiling: if a topic has 6 partitions, a single consumer group can usefully run at most 6 active consumer instances. A 7th instance joins the group but sits idle, holding no partitions, until one of the other six leaves. Conversely, a group running with only 2 consumers against those same 6 partitions simply means each consumer owns 3 partitions and interleaves work across them in its poll loop. Neither situation is an error — it's a direct, mechanical consequence of the ownership rule, and it's why partition count is the number you plan around before you plan instance count.
🎯 Key Principle: Consumer group scaling is bounded by partition count, not by however many instances you're willing to deploy. Adding consumers past that ceiling adds no throughput — it only adds idle processes waiting for a rebalance.
Recap: Offsets Checkpoint Progress, They Don't Acknowledge Messages
The second load-bearing idea in this lesson is the distinction between a message's log offset — its fixed position in the partition — and the group's committed offset — the group's saved bookmark for where to resume. A commit doesn't mark individual messages as "done"; it moves one number forward, and Kafka treats everything before that number as handled, regardless of whether every message in between was actually processed successfully.
This is why commit timing, not the existence of a commit, is what determines your delivery semantics:
- Commit before processing → if the process crashes mid-handler, the crash point's messages are skipped on restart because the offset already moved past them. This risks message loss.
- Commit after processing → if the process crashes after processing but before the commit lands, the same messages get replayed on restart. This risks duplicate processing, which is the foundation of at-least-once delivery.
Neither pattern gives you exactly-once delivery for free — that requires idempotent handling or transactional writes, which sit outside what committed offsets alone can guarantee. The concrete mechanics of how to control that timing — manual commits, auto-commit intervals, EnableAutoCommit, and where offsets are actually stored — belong to the dedicated Offset Management lesson; what matters here is that you recognize commit timing as a design decision, not an implementation detail you can ignore.
Recap: The Minimal .NET Consumer Shape
Stripped to its essentials, a working Confluent.Kafka consumer needs exactly four pieces: a GroupId in the configuration, a Subscribe() call naming the topic, a Consume() loop that runs until told to stop, and a Close() on shutdown so the broker learns about departure immediately instead of waiting out a session timeout. Here's that shape again, condensed to the pieces that matter for this recap:
var config = new ConsumerConfig
{
BootstrapServers = "localhost:9092",
GroupId = "order-processing-group", // ties this instance to a specific group
AutoOffsetReset = AutoOffsetReset.Earliest
};
using var consumer = new ConsumerBuilder<string, string>(config).Build();
consumer.Subscribe("orders");
using var cts = new CancellationTokenSource();
Console.CancelKeyPress += (_, e) =>
{
e.Cancel = true; // let the finally block run Close() instead of killing the process
cts.Cancel();
};
try
{
while (!cts.IsCancellationRequested)
{
var result = consumer.Consume(cts.Token);
// handler logic goes here
}
}
finally
{
consumer.Close(); // leaves the group cleanly, triggers an immediate rebalance rather than a timeout
}
Every one of these four pieces maps directly back to a concept earlier in the lesson: GroupId determines which ownership set this instance joins, Subscribe opts the instance into partition assignment, the Consume loop is where offsets advance as messages are read, and Close() is what lets the group's ownership model react immediately rather than waiting on a failure detection window. Miss any one of the four and you get a consumer that runs but misbehaves in a way that's easy to misdiagnose as a Kafka problem rather than a code problem.
💡 Mental Model: Think of the four pieces as answering four separate questions — which group am I in (GroupId), what am I reading (Subscribe), how do I get messages (the Consume loop), and how do I leave politely (Close). A consumer missing any one of these answers is incomplete, not just suboptimal.
Review Checklist
Before treating a consumer as production-ready, it's worth walking through four checks that map directly to failure modes covered in Common Pitfalls When Consuming with Confluent.Kafka. This isn't a re-explanation of those pitfalls — it's the condensed checklist version, meant for a quick pass over your own code:
- Thread-safety — is a single
IConsumer<TKey,TValue>instance ever touched from more than one thread? The client is not thread-safe internally, so sharing one instance across threads corrupts its state rather than throwing an obvious exception. - Handler duration — does the code inside the
Consumeloop do slow, blocking work (synchronous I/O, unbounded retries, large synchronous computations) before looping back to callConsume()again? A slow handler delays the next poll, which delays heartbeats. - Shutdown handling — does every exit path, including exceptions, still reach a
Close()/Dispose()call? A process that's killed or that throws past thefinallyblock leaves the broker waiting out the session timeout before it reassigns those partitions. - GroupId correctness — is the
GroupIdvalue identical, byte-for-byte, across every instance that's meant to share the topic's partitions? A typo or a copy-paste slip silently creates a second group instead of failing loudly.
| 🔧 Check | 🎯 What it verifies | ⚠️ Symptom if wrong |
|---|---|---|
| 🔒 Thread-safety | Single instance, single thread | Corrupted internal state, unpredictable errors |
| ⏱️ Handler duration | Fast return to the poll loop | Delayed heartbeats, unwanted rebalances |
| 🚪 Shutdown handling | Close()/Dispose() on every exit path | Broker waits out session timeout to notice departure |
| 🏷️ GroupId correctness | Identical value across instances | Each instance reads the full topic instead of sharing it |
⚠️ Common Mistake: Treating this checklist as a one-time setup review rather than something to re-check after refactors. A handler that was fast at launch can quietly grow slow after a new dependency call is added inside the loop, reintroducing the exact heartbeat-delay problem the checklist was meant to catch.
What You Now Understand That You Didn't Before
At the start of this lesson, a single consumer reading a topic serially was the obvious approach and its throughput ceiling wasn't obviously a design problem. You now have the vocabulary and the mental model to reason about why that ceiling exists, how a consumer group removes it up to the limit set by partition count, and why an offset commit is a bookmark rather than a receipt for a specific message. Those two ideas — ownership is partition-scoped, and commits are checkpoints — are the load-bearing concepts that every later lesson in this path assumes you already have.
Where This Goes Next
Three follow-up lessons build directly on the model you've just consolidated, each drilling into a piece this lesson intentionally left at the surface level:
- Consumer Group Mechanics opens up the protocol behind partition assignment itself — how the group coordinator decides who owns what, and what a group's internal state actually looks like.
- Offset Management goes deep on the commit APIs, auto-commit intervals, and offset storage this section only gestured at, giving you the tools to deliberately choose between at-least-once and at-most-once behavior instead of getting whichever one your commit timing accidentally produces.
- Rebalancing & Backpressure explains what actually happens when a consumer is slow enough, or absent long enough, to trigger a reassignment — the scenario the handler-duration and shutdown-handling checklist items above were both quietly pointing toward.
💡 Pro Tip: As you move into those three lessons, keep re-anchoring new detail back to the two recap ideas above — partition-scoped ownership and offsets-as-checkpoints. Every mechanism you're about to learn (assignment strategies, commit APIs, rebalance triggers) is an elaboration of one of those two ideas, not a replacement for them.