You are viewing a preview of this lesson. Sign in to start learning
Back to Kafka for .NET Developers 2026

Topics, Partitions & Keys

Ordering is per partition, not per topic. Keys decide partition assignment, which decides ordering locality. Learn this until it's instinct.

Last generated

Introduction: The Topic-Partition-Key Mental Model

If you've built messaging into a .NET system before, you already have a mental model for what a "queue" does: a producer drops a message in, a consumer picks it up, and once it's processed, it's gone. MSMQ works this way. So does RabbitMQ. So does Azure Service Bus. That model is so ingrained that most developers carry it, unexamined, into their first Kafka project — and then get confused when consumer offsets, retention windows, and partition counts don't behave the way a queue depth or a dead-letter path would. Kafka calls its topics "topics," which sounds queue-adjacent, but the internal structure is genuinely different, and that difference has consequences for how you write producers and consumers from the very first line of code.

So before touching ProducerBuilder or ConsumerBuilder, it's worth asking three questions that will keep resurfacing throughout this lesson: What actually gets stored when you publish a message? What decides which part of the system a given message ends up on? And why should a .NET developer, used to thinking in terms of queues and topics as interchangeable ideas, care about the distinction? The answers hinge on three words — topic, partition, and key — and getting the relationship between them into instinct is the entire point of this lesson.

Kafka Topics Are Logs, Not Queues

A Kafka topic is not a container that holds messages until they're consumed. It's a named, append-only commit log — records are written to the end of the log and stay there, in the order they arrived, identified by a numeric position called an offset. Crucially, a topic isn't stored as one single log file. It's split into one or more partitions, each of which is its own independent, ordered, append-only log, and those partitions are spread across the brokers in your cluster. When people say "Kafka topic," they usually mean this whole distributed structure — a logical stream of records that is physically realized as a set of partitions living on different machines.

This matters immediately for one reason: retention. In MSMQ, RabbitMQ, or Azure Service Bus, a message is fundamentally transient — it exists to be delivered, and once a consumer acknowledges it (or once it expires), it's removed from the broker's storage. Kafka's model inverts this. A record written to a partition stays there, at its offset, for as long as the topic's retention policy allows — independent of whether zero consumers or a hundred consumers have read it. A consumer doesn't "take" a message off a partition; it reads a copy while simply advancing its own tracked offset. Another consumer group can come along later and read the exact same records from the beginning. This is genuinely a different storage philosophy, not just a different API surface, and it's why this lesson describes Kafka's core primitive as a log rather than a queue.

💡 Mental Model: Think of a queue as a checkout line — once you're served, you leave and the line forgets you were there. Think of a Kafka partition as a sequentially numbered ledger — every entry stays on the page at its line number, and "reading" just means someone glancing at line 47 without erasing it. Whether that ledger eventually gets trimmed, and by what rule, is a retention policy question — covered in its own dedicated lesson later in this roadmap — not a property of the log itself.

Why the Partition, Not the Topic, Is the Real Unit

Here's the detail that trips up most newcomers: when people talk about a Kafka topic's behavior — its ordering, its parallelism, its throughput — they're almost always actually talking about partition behavior. The topic is a convenient logical name you subscribe to and produce against, but nearly everything operationally significant happens at the partition level:

🧠 Storage — each partition is a physically separate log on disk (replicated across brokers), not a shared structure with the other partitions in the topic. 📚 Ordering — offsets are assigned per partition, so "record 10 comes after record 9" is only a guarantee within a single partition, never across the topic as a whole. (The precise guarantees this gives you, and where they break down, are the subject of the dedicated ordering lesson — for now, just internalize that ordering is a partition-scoped property.) 🔧 Parallelism — a single consumer within a consumer group is assigned whole partitions to read, so the number of partitions caps how many consumers in that group can do useful work simultaneously. 🎯 Routing — every record a producer sends is placed into exactly one partition, and the mechanism that decides which partition is the partitioner, driven primarily by the record's key.

That last point is the hinge the rest of this lesson turns on. If two records carry the same key, Kafka's default routing guarantees they land in the same partition — which means they land in the same ordered log, which means their relative order is preserved. If they carry different keys (or no key at all), there's no such guarantee. So the question "will my messages be processed in order?" is really the question "did my messages land on the same partition?" — and that is a direct function of key choice and the partitioning mechanism, which we'll dissect mechanically in "How the Partitioner Assigns Keys to Partitions."

🎯 Key Principle: Kafka does not order a topic. It orders each partition independently. Any correctness argument that depends on message ordering has to be phrased in terms of "same partition," not "same topic."

A Running Example: Order Events

To keep this concrete rather than abstract, every code sample in this lesson will build against the same scenario: an e-commerce system publishing order eventsOrderCreated, OrderPaid, OrderShipped — to a topic named order-events. Each event will be produced with the order's ID as the record key. This isn't an arbitrary choice of example; it's specifically chosen so you can see the partitioner mechanism do something observable: because every event for a given order shares the same key, they'll consistently land on the same partition, and a consumer reading that partition will see them in the order they were produced. Whether OrderId is actually the best key to choose in every scenario (versus, say, CustomerId) is a strategic question we're deliberately deferring — that trade-off belongs to the "Key Selection Strategy" lesson. Here, the order-events example exists purely to make the routing mechanism tangible.

Here's the shape of the record we'll be producing throughout the lesson — nothing executable yet, just the plain C# type that our worked examples will serialize:

// The event payload our producer will serialize as the message value.
// The OrderId will separately be used as the record's Kafka key.
public record OrderEvent(
    string OrderId,
    string EventType,   // e.g. "Created", "Paid", "Shipped"
    DateTimeOffset Timestamp,
    decimal Amount
);

And here's a preview of the shape of a produce call using Confluent.Kafka's IProducer<TKey, TValue> — full setup, configuration, and delivery-result handling are covered in "Producing and Consuming with Confluent.Kafka: A Worked Example," but seeing the key/value split here early makes the topic/partition/key relationship concrete rather than theoretical:

// Illustrative shape only — producer construction and config are covered later.
// Note that Key and Value are separate fields: the key drives partition routing,
// the value is the payload a consumer will deserialize.
var message = new Message<string, string>
{
    Key = orderEvent.OrderId,                       // routing input
    Value = JsonSerializer.Serialize(orderEvent)     // payload
};

var deliveryResult = await producer.ProduceAsync("order-events", message);
// deliveryResult.Partition tells you which partition the partitioner chose

Notice that Key and Value are distinct fields on the message — a distinction that doesn't exist in the same form in queue-based .NET messaging APIs, where a message body is just a body. In Kafka, the key is not incidental metadata; it's the input to the routing decision that determines which partition, and therefore which ordered log, the record joins.

What This Lesson Covers — and What It Deliberately Doesn't

Given how much surface area "topics, partitions, and keys" touches, it's worth being explicit about scope. This lesson will walk through the physical anatomy of a topic — partitions, broker leadership, and replication — in "Anatomy of a Topic: Partitions, Brokers, and Replicas"; the mechanics of how the default partitioner turns a key into a partition number (and how to override that) in "How the Partitioner Assigns Keys to Partitions"; and full producer/consumer code against our order-events example in "Producing and Consuming with Confluent.Kafka: A Worked Example." It closes with the operational mistakes that partition and key decisions commonly cause in "Common Pitfalls: Partition Count, Skew, and Remapping."

Three closely related topics are intentionally out of scope here, each because they deserve focused treatment rather than a rushed mention: the precise guarantees (and limits) of per-partition ordering live in the dedicated Partition Ordering lesson; the strategic question of which field makes a good key lives in the Key Selection Strategy lesson; and how long records actually persist before being deleted or compacted lives in the Log Retention lesson. Keeping those separate lets this lesson stay focused on one thing: building the structural mental model — topic, partition, broker, key — that all three of those later lessons assume you already have.

💡 Real-World Example: Picture a PaymentProcessed event and a ShipmentDispatched event for two different orders, produced back to back. Because they carry different order IDs as keys, the partitioner may route them to different partitions entirely — and a consumer has no guarantee it will see them in production order, because they aren't competing for position in the same log. Reasoning about "did A happen before B" in Kafka always starts with "were A and B on the same partition?" — which is exactly why key choice, covered in depth later, is not a cosmetic decision.

⚠️ Common Mistake: Treating a Kafka topic as if it behaves like an Azure Service Bus topic with subscriptions, where the messaging middleware quietly manages delivery and cleanup for you. ❌ Wrong thinking: "I don't need to think about partitions — Kafka will just deliver each message to my consumer and clean up after itself, same as Service Bus." ✅ Correct thinking: "My topic is a set of independently ordered logs; my consumer is assigned specific partitions to read, my messages persist until a retention policy removes them, and which partition each message lands on is a routing decision I make (or delegate) through the key." The rest of this lesson is about making that correct model automatic.

Anatomy of a Topic: Partitions, Brokers, and Replicas

A Kafka topic looks deceptively simple from the outside — you give it a name, you send messages to it, you read messages from it. But a topic is not a single file or a single queue sitting on one machine. It is a logical name that maps onto one or more partitions, and those partitions are the actual physical units that get stored, replicated, and read. Understanding this split — topic as label, partition as reality — is the prerequisite for everything else in this lesson, because every guarantee Kafka makes, and every guarantee it deliberately does not make, is a statement about partitions rather than about topics.

A Partition Is an Ordered, Immutable Log

Each partition is best pictured as an append-only file. New records are written to the end, existing records are never modified, and every record is tagged with a sequential integer called an offset that marks its position within that specific partition.

Topic: order-events (3 partitions)

Partition 0:
  offset 0 → OrderCreated(id=1042)
  offset 1 → OrderShipped(id=1042)
  offset 2 → OrderCreated(id=1050)

Partition 1:
  offset 0 → OrderCreated(id=1043)
  offset 1 → OrderCancelled(id=1043)

Partition 2:
  offset 0 → OrderCreated(id=1044)
  offset 1 → OrderShipped(id=1044)
  offset 2 → OrderShipped(id=1046)
  offset 3 → OrderCreated(id=1051)

Notice something important in that diagram: offset 1 exists in every partition, but it refers to three completely different records. Offsets are per-partition counters, not a single sequence shared across the topic. There is no such thing as "offset 1 of order-events" in isolation — you must always say "offset 1 of partition 1," because that's the only coordinate system Kafka actually maintains. A .NET developer coming from a single-queue mental model (say, an MSMQ queue with one linear backlog) has to unlearn the idea that a topic has one running position counter; it has as many counters as it has partitions, all advancing independently.

🎯 Key Principle: A topic's ordering and storage guarantees exist at the partition level, not the topic level. If you remember nothing else from this section, remember that "topic" is an organizational label over a set of independently-ordered logs.

Leaders, Followers, and Replication

Partitions don't live on just one broker if you want fault tolerance — and in any real deployment, you do. For each partition, Kafka designates one broker as the leader, which handles every read and every write for that partition. The remaining copies live on other brokers as follower replicas, which continuously fetch and copy the leader's data but do not serve client traffic under normal operation. The number of copies — leader plus followers — is controlled by the topic's replication factor.

Partition 0 (replication factor = 3)

  Broker 1: [Leader]    ← handles all reads/writes for Partition 0
  Broker 2: [Follower]  ← replicates Partition 0's log
  Broker 3: [Follower]  ← replicates Partition 0's log

If the broker hosting the leader fails, one of the in-sync followers is promoted to leader, and clients transparently redirect to it. A replication factor of 3 is a common baseline in production clusters because it tolerates the simultaneous loss of one broker while still keeping a majority of replicas available for leader election — the exact mechanics of in-sync replica sets and acknowledgment settings belong to a broader durability discussion, but the structural point for this lesson is simpler: replication factor is a per-partition property that determines how many broker-local copies of that partition's log exist. A topic with 6 partitions and a replication factor of 3 doesn't store 3 copies of the topic — it stores 3 copies of each of the 6 partitions, for 18 physical partition-copies scattered across the cluster.

⚠️ Common Mistake: Assuming replication factor and partition count are the same knob. They answer different questions: partition count answers "how many parallel logs make up this topic," while replication factor answers "how many broker-local copies does each of those logs get." Setting replication factor to 1 in anything beyond local experimentation means a single broker failure makes that partition's data unavailable — there's no follower to promote.

Partition Count Sets the Ceiling on Parallelism

Because each partition can only be actively consumed by one consumer within a given consumer group at a time (the mechanics of that assignment are covered in "Producing and Consuming with Confluent.Kafka: A Worked Example"), the number of partitions a topic has puts a hard ceiling on how many consumer instances in a single group can do useful work simultaneously. A topic with 3 partitions can keep at most 3 consumers in one group busy; add a fourth consumer to that group and it sits idle, because there's no fourth partition to hand it. This is the structural reason partition count is a capacity-planning decision, not just a storage detail — sizing it well and the operational consequences of getting it wrong are the focus of "Common Pitfalls: Partition Count, Skew, and Remapping," but it's worth internalizing now, while you're still looking at the topic's raw anatomy, that partition count is your parallelism budget.

💡 Mental Model: Think of a topic's partitions as a fixed number of checkout lanes in a store. Adding more cashiers than lanes doesn't speed anything up — the extra cashiers just stand around. The number of lanes is decided when the store is built (or, in Kafka's case, can be increased later — but as you'll see later in this lesson, doing so has a real cost for keyed messages).

Creating a Topic from .NET Code

All of this structure — partitions, replicas, leaders — has to be specified somewhere, and in a Confluent.Kafka application that's typically done through the IAdminClient interface rather than a broker's command-line tooling. The CreateTopicsAsync method lets you declare the partition count and replication factor programmatically, which is useful for integration tests, provisioning scripts, or any setup path where you want topic creation to be part of your deployable code rather than a manual operational step.

using Confluent.Kafka;
using Confluent.Kafka.Admin;

var adminConfig = new AdminClientConfig
{
    BootstrapServers = "localhost:9092"
};

using var admin = new AdminClientBuilder(adminConfig).Build();

try
{
    await admin.CreateTopicsAsync(new[]
    {
        new TopicSpecification
        {
            Name = "order-events",
            NumPartitions = 6,          // parallelism ceiling for one consumer group
            ReplicationFactor = 3        // leader + 2 followers per partition
        }
    });

    Console.WriteLine("Topic 'order-events' created.");
}
catch (CreateTopicsException ex)
{
    // CreateTopicsAsync throws if the topic already exists or the
    // requested replication factor exceeds the number of available brokers
    Console.WriteLine($"Topic creation failed: {ex.Results[0].Error.Reason}");
}

A few things are worth calling out about this snippet. First, NumPartitions and ReplicationFactor are the two structural properties this section has been building toward — everything else in TopicSpecification (configuration overrides for retention, compaction, and so on) is orthogonal to the anatomy we're discussing here. Second, CreateTopicsAsync is asynchronous and idempotent-unfriendly in one specific sense: calling it twice for a topic that already exists throws a CreateTopicsException rather than silently succeeding, so production provisioning code typically checks existing topics first or catches and inspects that exception, as shown above. Third, requesting a replication factor higher than the number of brokers currently in the cluster will fail outright — you cannot replicate a partition onto more brokers than exist, which is a concrete illustration of why replication factor is a physical constraint, not just a configuration number.

⚠️ Common Mistake: Hardcoding ReplicationFactor = 1 because it's what many local single-broker Docker setups default to, then deploying the same code unmodified against a multi-broker cluster. The topic will still get created, but it silently forfeits the fault tolerance a multi-broker cluster is supposed to provide for that data. It's worth making replication factor a configuration value read from environment-specific settings rather than a literal in code, precisely so a local dev value doesn't leak into a production provisioning script.

Reading the Anatomy Back Out

Once a topic exists, the same admin client can describe its structure — which is how you'd confirm that the partition and replica layout actually matches what you requested, and it's also how you'd discover the current leader for a given partition if you needed to reason about failure behavior. That inspection workflow, using GetMetadata(), is demonstrated in the worked example later in this lesson alongside producer and consumer code, since at that point you'll have a running topic with real data in it to inspect. For now, the mental model to carry forward is architectural: a topic is a named collection of independently-ordered partitions, each partition has exactly one leader broker and zero or more follower replicas depending on the replication factor, and the number of partitions is the hard limit on how much of that topic a single consumer group can process in parallel. Keys — which decide which partition a given record lands in — are the mechanism that connects this physical layout to your application's data, and that mechanism is exactly what "How the Partitioner Assigns Keys to Partitions" picks up next.

How the Partitioner Assigns Keys to Partitions

When a .NET producer calls ProduceAsync on a topic with several partitions, something has to decide which of those partitions actually receives the record. That decision is made by a component called the partitioner, and understanding exactly how it works is what lets you predict — rather than guess — where any given message will end up. This section is deliberately narrow: it covers the mechanics of routing a message to a partition, not the separate question of which field makes a good key (that's the job of the Key Selection Strategy lesson later in this roadmap).

The Default Partitioner: Hashing a Key

When you produce a message with a non-null key, the Confluent.Kafka client's default partitioner applies a hash-modulo formula: it computes a hash of the key's bytes and takes that hash modulo the current number of partitions on the topic. The result is a partition number between 0 and partitionCount - 1.

key bytes → hash function → hash value
hash value % partitionCount → partition number

The critical property here is determinism: the same key, hashed against the same partition count, always produces the same partition number. If OrderId = "ORD-4471" hashes to partition 2 today, it will hash to partition 2 again tomorrow, next week, and every time after that — as long as the topic's partition count doesn't change. This is what makes keyed ordering possible at all: every producer instance, run independently, computes the identical mapping, so all records for that order converge on one partition without any coordination between producers.

Here's what that looks like when producing order events in .NET:

using Confluent.Kafka;

var config = new ProducerConfig { BootstrapServers = "localhost:9092" };

using var producer = new ProducerBuilder<string, string>(config).Build();

// The key ("ORD-4471") is hashed by the default partitioner.
// Every message with this exact key lands on the same partition.
var result = await producer.ProduceAsync("order-events", new Message<string, string>
{
    Key = "ORD-4471",
    Value = "{ \"status\": \"Created\" }"
});

Console.WriteLine($"Delivered to partition {result.Partition.Value} at offset {result.Offset.Value}");

Run this three more times with the same key and different status payloads ("Paid", "Shipped", "Delivered"), and every one of them reports the same partition number back through result.Partition. That's the hash-modulo formula doing its job — the value being produced is irrelevant to routing; only the key bytes matter.

⚠️ Common Mistake: Assuming the partition number is stable across topic resizes. The modulo operation depends on the current partition count, so if that count changes, the formula's output for existing keys changes too. That consequence — and why it matters operationally — is covered in "Common Pitfalls: Partition Count, Skew, and Remapping" later in this lesson; the mechanism to remember here is simply that the formula is hash(key) % partitionCount, not hash(key) % (some fixed number).

What Happens When the Key Is Null

Not every message needs a key. When you produce with Key = null, there's no hash to compute, so the partitioner has to fall back to a different strategy. It's tempting to assume this means simple round-robin — message 1 to partition 0, message 2 to partition 1, message 3 to partition 2, and so on — and that was historically how some Kafka clients handled null keys.

Modern client behavior instead favors a sticky partitioning strategy: rather than round-robining every single message, the producer sends a batch of null-key messages to one partition until that batch fills up or a linger interval elapses, then switches to a new partition for the next batch. The reasoning is throughput, not fairness of distribution — Kafka producers batch records per partition before sending them over the network, and constantly rotating partitions on a per-message basis defeats batching by spreading each partition's queue thin. Sticky batching still spreads load across all partitions over time, just in bursts rather than strict alternation.

Strict round-robin (older mental model):
  msg1 → P0, msg2 → P1, msg3 → P2, msg4 → P0, ...
  (small batches per partition — more network requests)

Sticky batching (current default for null keys):
  msg1..msg50 → P1 (one batch fills, then flushes)
  msg51..msg80 → P2 (next batch)
  ...
  (fewer, fuller batches per partition — better throughput)

The practical takeaway: if you don't supply a key, you get good load distribution across partitions, but you get no ordering guarantee whatsoever between any two unkeyed messages, since they may not even land on the same partition, let alone in a predictable sequence. That's a direct consequence of the routing mechanism itself, not a separate rule to memorize.

💡 Mental Model: Think of the default partitioner as a two-mode switch. Key present → deterministic hash routing. Key absent → sticky batching optimized for producer throughput. Nothing more exotic happens by default.

Bypassing the Partitioner: Explicit Partition Targeting

Sometimes you don't want Kafka to compute anything — you already know exactly which partition a record belongs on, and you want to write directly to it. Confluent.Kafka supports this through an overload of ProduceAsync that accepts a TopicPartition instead of a plain topic name string. When you use this overload, the hashing partitioner is never invoked at all; the client sends the record straight to the partition you specified.

using Confluent.Kafka;

var config = new ProducerConfig { BootstrapServers = "localhost:9092" };
using var producer = new ProducerBuilder<string, string>(config).Build();

// TopicPartition targets partition 3 directly.
// The key is still stored with the message, but it plays no role
// in routing here — the partitioner is bypassed entirely.
var topicPartition = new TopicPartition("order-events", new Partition(3));

var result = await producer.ProduceAsync(topicPartition, new Message<string, string>
{
    Key = "ORD-4471",
    Value = "{ \"status\": \"Refunded\" }"
});

Console.WriteLine($"Explicitly written to partition {result.Partition.Value}");

This is useful for scenarios like migrating data between topics while preserving each record's original partition, or building diagnostic tooling that needs to write a test record to a specific partition to verify consumer behavior on it. It's a narrow, deliberate escape hatch rather than a routine production pattern — most application code should let the partitioner do its job, since manual partition assignment means you're responsible for maintaining whatever colocation guarantees your keys were supposed to provide.

⚠️ Common Mistake: Mixing explicit TopicPartition targeting with keyed hash-based produces for the same logical key. If some code paths route "ORD-4471" through the hash formula (landing it on, say, partition 2) while another path explicitly forces it onto partition 3, you've silently broken the very colocation guarantee keying was meant to provide — messages for the same order now live on two different partitions with two independent offset sequences.

Writing a Custom IPartitioner

The default hash-modulo behavior covers the overwhelming majority of use cases, but Confluent.Kafka also lets you override it entirely by implementing the IPartitioner interface and registering it on the producer configuration. This gives you full control over the mapping from key bytes to partition number — useful when you need routing logic that isn't a pure hash, such as directing certain key prefixes to a reserved subset of partitions for isolation purposes.

using Confluent.Kafka;

// A custom partitioner that routes keys starting with "VIP-"
// to a reserved low-numbered partition range, and everything
// else through a standard hash.
public class VipAwarePartitioner : IPartitioner
{
    public Partition Partition(string topic, int partitionCount, ReadOnlySpan<byte> keyData, bool keyIsNull)
    {
        if (keyIsNull)
        {
            // Fall back to a simple default when there's no key.
            return new Partition(0);
        }

        var key = System.Text.Encoding.UTF8.GetString(keyData);

        if (key.StartsWith("VIP-"))
        {
            // Reserve partition 0 for VIP traffic, regardless of hash.
            return new Partition(0);
        }

        // Standard hash-modulo routing for everything else,
        // skipping the reserved partition.
        var hash = key.GetHashCode() & 0x7FFFFFFF; // force non-negative
        var routedPartition = 1 + (hash % (partitionCount - 1));
        return new Partition(routedPartition);
    }
}

Registering it requires wiring it into the producer builder's partitioner configuration for the relevant topic, after which every ProduceAsync call against that topic runs through your custom logic instead of the library default. Custom partitioners are a specialized tool — reach for one only when you have a concrete routing requirement the hash formula can't express, since a hand-rolled partitioner also inherits the same colocation responsibilities: get the logic wrong and you can just as easily scatter a single order's events across multiple partitions as the built-in hash would if misused.

🎯 Key Principle: The partitioner is purely a routing mechanism — it answers "which partition?" It has no opinion on "which field should be the key?" That second question, including trade-offs like keying by OrderId versus CustomerId, is scoped entirely to the Key Selection Strategy lesson; here, treat the key as a given input and focus on tracing how it flows through hash-modulo, sticky batching, explicit targeting, or custom logic to produce a partition number.

Producing and Consuming with Confluent.Kafka: A Worked Example

Everything covered so far about topics, partitions, and the hashing partitioner is invisible until you actually run a producer and a consumer and watch the numbers come back. This section builds that end-to-end picture in C# using the Confluent.Kafka client — the standard .NET client for Apache Kafka — with a single running scenario: order events keyed by OrderId. By the end, you'll see exactly where the partition and offset show up in the API, and you'll confirm with real output that every event for a given order lands on the same partition.

Building the Producer

The entry point on the producer side is ProducerBuilder<TKey, TValue>. You configure it with your broker addresses, specify the key and value types as generic parameters, and Confluent.Kafka handles serialization using built-in serializers (here, Serializers.Utf8 for strings) or your own custom ones.

using Confluent.Kafka;

var producerConfig = new ProducerConfig
{
    BootstrapServers = "localhost:9092"
};

using var producer = new ProducerBuilder<string, string>(producerConfig)
    .SetKeySerializer(Serializers.Utf8)
    .SetValueSerializer(Serializers.Utf8)
    .Build();

// An order event: key = OrderId, value = a JSON payload (simplified as a string here)
var orderId = "ORD-48213";
var message = new Message<string, string>
{
    Key = orderId,
    Value = "{\"orderId\":\"ORD-48213\",\"status\":\"Created\"}"
};

DeliveryResult<string, string> result =
    await producer.ProduceAsync("order-events", message);

Console.WriteLine(
    $"Delivered to partition {result.Partition.Value} " +
    $"at offset {result.Offset.Value}");

The call to ProduceAsync sends the keyed message to the order-events topic and awaits the broker's acknowledgment. The returned DeliveryResult<string, string> is where the abstract routing mechanism becomes observable: result.Partition tells you which partition the default partitioner sent this key to (recall from "How the Partitioner Assigns Keys to Partitions" that this is hash(key) % partitionCount), and result.Offset tells you the exact position this record now occupies within that partition's log. If you call ProduceAsync again with the same orderId, you will get the same partition number back every time — that determinism is the entire mechanism the earlier section explained, now visible as a concrete integer in your console output.

⚠️ Common Mistake: Reusing the synchronous-looking Produce method (which takes a delivery-report callback instead of returning a Task) and then not waiting for delivery confirmation before the process exits. Messages can be lost from the in-memory buffer if the application terminates before the callback fires. For request/response-style code paths, ProduceAsync is the safer default because it surfaces DeliveryResult (or throws ProduceException on failure) synchronously in your control flow.

Building the Consumer

On the consuming side, ConsumerBuilder<TKey, TValue> mirrors the producer's shape. The critical extra piece of configuration is GroupId, which places this consumer into a consumer group — a named set of consumers that split up the partitions of a topic between them.

using Confluent.Kafka;

var consumerConfig = new ConsumerConfig
{
    BootstrapServers = "localhost:9092",
    GroupId = "order-processing-service",
    AutoOffsetReset = AutoOffsetReset.Earliest
};

using var consumer = new ConsumerBuilder<string, string>(consumerConfig)
    .SetKeyDeserializer(Deserializers.Utf8)
    .SetValueDeserializer(Deserializers.Utf8)
    .Build();

consumer.Subscribe("order-events");

while (true)
{
    ConsumeResult<string, string> record = consumer.Consume();

    Console.WriteLine(
        $"key={record.Message.Key} " +
        $"partition={record.Partition.Value} " +
        $"offset={record.Offset.Value} " +
        $"value={record.Message.Value}");
}

Each call to consumer.Consume() blocks until a record is available and returns a ConsumeResult<string, string>. The fields you care about for the topic/partition/key model are record.Partition, record.Offset, and record.Message.Key. Notice the symmetry with the producer side: the partition and offset the consumer reports for a given record are exactly the values the producer's DeliveryResult reported when that record was written. Offsets are not reassigned or renumbered on the way through — they are a fixed coordinate in that partition's log, as introduced in "Anatomy of a Topic."

Consumer Groups and Partition Assignment

A single consumer instance can read every partition of a topic by itself, but production systems typically run several consumer instances sharing the same GroupId to parallelize processing. Kafka's rule for this is strict: within a consumer group, each partition is assigned to exactly one consumer at a time. If order-events has four partitions and your group has two consumers, a typical split is two partitions per consumer; if you scale to four consumers, each gets one partition and further consumers beyond that sit idle for this topic, since a partition cannot be split between two group members.

order-events (4 partitions)
Consumer group: order-processing-service

Partition 0 ─┐
Partition 1 ─┼──► Consumer A
Partition 2 ─┐
Partition 3 ─┼──► Consumer B

Which consumer gets which partitions is decided by a pluggable partition assignor, configured via PartitionAssignmentStrategy on the consumer config. The Range assignor (the historical default) assigns contiguous partition ranges per topic to each consumer, which can produce mild imbalance when a group subscribes to multiple topics. The CooperativeSticky assignor improves on this by minimizing partition movement during rebalances — when a consumer joins or leaves the group, it tries to leave as many existing assignments untouched as possible rather than reshuffling everything, which reduces the pause in processing that a full rebalance causes. Naming these here is enough to recognize the setting when you see it in configuration; the deeper mechanics of rebalancing protocols sit outside this lesson's scope.

💡 Mental Model: think of a consumer group as a team splitting up a stack of numbered folders (partitions). Every folder is being read by exactly one team member at any moment — no folder is shared, and no folder is skipped, but which member holds which folder can change as people join or leave the team.

Full Worked Example: Order Events Colocated by OrderId

Now put both sides together on a topic with multiple partitions and confirm the claim that matters most for ordering: all events for the same order land on the same partition. Suppose order-events has 4 partitions, and you produce three events for ORD-48213 (Created, PaymentReceived, Shipped) and one event for a different order, ORD-90110:

string[] orderIds = { "ORD-48213", "ORD-48213", "ORD-48213", "ORD-90110" };
string[] statuses = { "Created", "PaymentReceived", "Shipped", "Created" };

for (int i = 0; i < orderIds.Length; i++)
{
    var msg = new Message<string, string>
    {
        Key = orderIds[i],
        Value = $"{{\"orderId\":\"{orderIds[i]}\",\"status\":\"{statuses[i]}\"}}"
    };

    var result = await producer.ProduceAsync("order-events", msg);
    Console.WriteLine($"{orderIds[i]} ({statuses[i]}) -> partition {result.Partition.Value}, offset {result.Offset.Value}");
}

Because the default partitioner computes hash(key) % partitionCount and the key is identical for the first three sends, you will observe the same partition number reported for all three ORD-48213 events, while ORD-90110 may land on the same or a different partition depending on how its hash reduces modulo 4 — for example, output might read: ORD-48213 (Created) -> partition 2, ORD-48213 (PaymentReceived) -> partition 2, ORD-48213 (Shipped) -> partition 2, ORD-90110 (Created) -> partition 0. The consumer side confirms this from the other direction: a single consumer reading partition 2 will see the three ORD-48213 records in the exact order they were produced, because within one partition, Kafka preserves write order — this per-partition guarantee is explored fully in "Common Pitfalls: Partition Count, Skew, and Remapping" and the dedicated Partition Ordering lesson later in this series. What this worked example demonstrates concretely is the mechanism that guarantee depends on: keying by OrderId is what physically colocates an order's full history on one ordered log, rather than scattering it across partitions where no ordering relationship exists between them.

⚠️ Common Mistake: assuming that because all three events are in the topic, a consumer reading the whole topic sees them in produced order. If a consumer group has multiple members, ORD-48213's events (all on partition 2) will always be read in order by whichever single consumer owns partition 2, but that says nothing about the relative timing of records on other partitions — the topic as a whole has no global order, only partition-local order.

Inspecting Partitions and Leaders with IAdminClient

Sometimes you need to inspect a topic's structure at runtime rather than assume it — for instance, to confirm the partition count before deciding how many consumer instances to run, or to check which broker currently leads a given partition during a troubleshooting session. The IAdminClient interface, built via AdminClientBuilder, exposes GetMetadata for exactly this.

using Confluent.Kafka;

var adminConfig = new AdminClientConfig { BootstrapServers = "localhost:9092" };
using var admin = new AdminClientBuilder(adminConfig).Build();

// Fetch metadata for a single topic (pass null to get metadata for all topics)
Metadata metadata = admin.GetMetadata("order-events", TimeSpan.FromSeconds(10));

var topicMetadata = metadata.Topics.Single(t => t.Topic == "order-events");

Console.WriteLine($"Topic: {topicMetadata.Topic}");
foreach (var partition in topicMetadata.Partitions)
{
    Console.WriteLine(
        $"  Partition {partition.PartitionId}: " +
        $"leader broker id = {partition.Leader}, " +
        $"replicas = [{string.Join(",", partition.Replicas)}]");
}

This prints one line per partition showing which broker currently holds the leader role (the broker Confluent.Kafka's producer and consumer clients actually talk to for that partition) and which brokers hold replicas, tying directly back to the leader/replica structure from "Anatomy of a Topic." This is a genuinely useful debugging habit: if a consumer group seems to be reading unevenly, checking GetMetadata first tells you the actual partition count and replica layout instead of relying on documentation or memory, which can drift out of sync with what's actually deployed.

💡 Pro Tip: GetMetadata is a synchronous, relatively cheap call against the broker's cached cluster state, so it's safe to use in a startup health check or a diagnostic endpoint — but it's not meant to be polled in a tight loop as a substitute for proper monitoring tooling.

Taken together, these three pieces — a producer reporting the partition and offset it wrote to, a consumer reporting the partition and offset it read from, and an admin client reporting the topic's actual partition layout — give you a complete, verifiable loop. You are no longer taking the partitioner's behavior on faith; you can watch a key resolve to a partition number in your own terminal output and confirm the topic's shape independently. The next section turns to what goes wrong when partition counts and key distributions aren't managed carefully, building directly on the colocation behavior you just observed here.

Common Pitfalls: Partition Count, Skew, and Remapping

Every mistake in this section traces back to one habit: treating partition count and key choice as implementation details you set once and forget. In practice, both decisions ripple through the lifetime of a topic — they determine how much parallelism your consumers can ever have, what happens when traffic grows, and whether "same key, same partition" still holds true after you've made a change that felt harmless. This section walks through the operational mistakes .NET teams run into most often once a topic moves from a demo into production traffic.

The Partition Count Trade-off

Partition count is the ceiling on how many consumers in a single consumer group can process a topic in parallel, because Kafka assigns each partition to exactly one consumer within a group at a time (the mechanics of that assignment were covered in "Producing and Consuming with Confluent.Kafka: A Worked Example"). That ceiling cuts both ways, and both directions carry a real cost.

Too few partitions caps parallelism directly. If an orders topic has 3 partitions and you scale your consumer group to 6 instances hoping to double throughput, three of those instances sit idle — Kafka has no partition left to assign them. You've paid for compute that does nothing.

Too many partitions looks free until you account for what each partition costs the broker cluster, not the producer or consumer. Every partition is a set of log segment files that a broker keeps open file handles for, and every partition has metadata the broker's controller has to track. Each partition also requires leader election — the process of promoting a replica to leader — when a broker fails or restarts, and a cluster with tens of thousands of partitions spread across it makes those elections slower and more disruptive because there's simply more state to reconcile. There isn't a single number that's correct for all clusters; the right partition count depends on target throughput per partition, consumer count, and how much broker overhead your cluster is provisioned to absorb, which is why sizing is a capacity-planning exercise rather than a fixed rule.

Too few partitions:        Right-sized:              Too many partitions:

[P0]-> C1                  [P0]-> C1                 [P0..P29] spread across
[P1]-> C2                  [P1]-> C2                 6 consumers, but broker
[P2]-> C3                  [P2]-> C3                 now tracks 30x the file
        C4 idle             [P3]-> C4                 handles and leader
        C5 idle             [P4]-> C5                 metadata for one topic
        C6 idle             [P5]-> C6

🎯 Key Principle: Partition count sets the maximum parallelism your consumer group can ever reach, but every partition you add is standing broker overhead whether or not it's carrying meaningful traffic. Size for expected peak consumer parallelism, not for an arbitrary round number.

⚠️ Common Mistake: Teams often pick partition counts like 1, 6, or 12 because they match the current consumer instance count, then have no room to scale consumers later without a repartitioning operation. Since you can add partitions later but the operation has consequences (below), it's worth padding the initial count modestly above your near-term consumer target rather than matching it exactly.

What Breaks When You Add Partitions

Here is the pitfall that catches teams who understand the partitioner mechanism but haven't connected it to what happens under a live topic. Recall from "How the Partitioner Assigns Keys to Partitions" that the default partitioner routes a keyed message using hash(key) % partitionCount. That formula is deterministic only as long as partitionCount stays fixed. The moment you run an admin operation to add partitions to an existing topic — for example, going from 6 partitions to 12 — the modulus in that formula changes for every key in your system, immediately.

Concretely: with 6 partitions, hash("order-4471") % 6 might land on partition 2. After increasing to 12 partitions, hash("order-4471") % 12 will very likely land somewhere else — there's no guarantee of overlap between the old and new assignment. Kafka does not rehash and move existing data to match the new formula; the records already written to partition 2 stay on partition 2. What changes is where new messages with that key get routed going forward.

This matters enormously if your application depends on per-key colocation — the guarantee that all events for a given key land on the same partition, which is often relied on for maintaining per-key event order or for co-locating related state in a stream-processing job. After a partition count change, that guarantee breaks silently: old events for order-4471 sit on partition 2, new events for the same order start appearing on some other partition, and any consumer logic that assumed "everything about this order is on one partition" is now wrong with no error or exception raised anywhere.

// Adding partitions to an existing topic — this is a one-way operation.
// After this call, hash(key) % partitionCount changes for every key,
// and Kafka does NOT move existing records to match the new mapping.
using var adminClient = new AdminClientBuilder(new AdminClientConfig
{
    BootstrapServers = "localhost:9092"
}).Build();

await adminClient.CreatePartitionsAsync(new[]
{
    new PartitionsSpecification
    {
        Topic = "orders",
        IncreaseTo = 12 // was 6 — every key's target partition can now change
    }
});

⚠️ Common Mistake: Assuming that increasing partition count is a purely additive, low-risk scaling operation. It is additive for throughput headroom, but it is a breaking change for any consumer or downstream job that relies on same-key-same-partition behavior across the transition. Kafka provides no built-in warning or migration path for this — the responsibility falls entirely on the team making the change to know which consumers depend on colocation before running it.

💡 Mental Model: Think of partition count as something closer to a database's sharding key count than a queue's worker pool size. Resharding a database moves data; increasing Kafka partitions does not — it just changes the routing table for records that haven't been written yet, leaving old records where they are.

Skewed Keys and Hot Partitions

Even a topic with a generous, well-planned partition count can bottleneck if the keys feeding it are unevenly distributed. Key skew happens when a small number of key values account for a disproportionate share of traffic, so the partitions those keys hash to become hot partitions — carrying far more volume than their siblings while the rest of the partitions sit comparatively idle.

Imagine an order-events topic with 12 partitions, keyed by CustomerId. If one enterprise customer generates a large share of total order volume — a very plausible scenario in B2B systems — every event for that customer routes to the same single partition by design, since that's exactly what the partitioner is supposed to do. The other 11 partitions handle the remaining customers comfortably, but the one partition carrying the large customer's traffic becomes a throughput ceiling: its consumer processes as fast as it can, while sibling consumers finish their partition's work and wait. The topic's aggregate partition count looks sufficient on paper, but the actual achievable throughput is bounded by the busiest single partition, not the average.

This is a distinct failure from having too few partitions overall — adding more partitions doesn't fix it if the skewed key still hashes to only one of them. Mitigating skew (salting hot keys, choosing a higher-cardinality key field, or splitting a hot key's traffic across multiple partitions deliberately) is exactly the kind of trade-off covered in the "Key Selection Strategy" lesson; the point to internalize here is recognizing the symptom — one partition's consumer lagging while others are caught up — as a sign of key skew rather than a sign of insufficient partition count.

💡 Real-World Example: A monitoring dashboard showing consumer lag per partition is the fastest way to catch this. If lag graphs show one or two partitions climbing steadily while the rest hover near zero, that's the signature of a hot key, not a global capacity problem — throwing more partitions or more consumers at the topic won't move that line down.

Ordering Is Per Partition, Not Per Topic

A closely related misreading of Kafka's guarantees shows up constantly in .NET teams migrating from systems like Azure Service Bus, where a single queue often implies a single, topic-wide order. Kafka's ordering guarantee only applies within a partition — records written to the same partition are read back in the same order they were written, but there is no guarantee about interleaving order across partitions in a multi-partition topic.

❌ Wrong thinking: "I published Order Created, then Order Shipped,
   to the 'orders' topic, so consumers will always see Created before
   Shipped."

✅ Correct thinking: "Created and Shipped will arrive in that order
   ONLY if both events share a key that hashes to the same partition.
   If they land on different partitions, no ordering is guaranteed
   between them."

This is precisely why key choice and ordering are inseparable topics: choosing OrderId as the key for both Order Created and Order Shipped events is what forces them onto the same partition and therefore into a guaranteed sequence. The full mechanics of that guarantee — including what happens across producer retries and consumer rebalances — are the focus of the dedicated "Partition Ordering" lesson; the point to fix in your instincts here is narrower but non-negotiable: never reason about ordering at the topic level, only ever at the partition level.

Null and Accidentally-Constant Keys

The last pitfall is the quietest one, because it doesn't throw an exception or show up in a code review — it shows up weeks later as a lopsided lag dashboard. Recall from "How the Partitioner Assigns Keys to Partitions" that a null key triggers Kafka's sticky/round-robin batching behavior across all partitions, which is generally healthy for distribution. The danger is a key that is not null but constant — every message carries a key, so nothing about the producer call looks wrong, but that key never varies.

This happens more often than it sounds through small, easy-to-miss coding mistakes:

// Bug: keying every message by a hardcoded string instead of the
// per-order identifier. Every message hashes to the SAME partition,
// no matter how many partitions the topic has.
await producer.ProduceAsync("orders", new Message<string, string>
{
    Key = "order-event",           // constant literal, not OrderId
    Value = JsonSerializer.Serialize(orderEvent)
});

// Fix: key by the field that actually varies per message.
await producer.ProduceAsync("orders", new Message<string, string>
{
    Key = orderEvent.OrderId,       // varies per order, spreads across partitions
    Value = JsonSerializer.Serialize(orderEvent)
});

In the buggy version, hash("order-event") % partitionCount evaluates to the exact same partition index for every single message the producer sends, regardless of how many partitions the topic has — a 12-partition topic degrades to the effective throughput of one partition, and 11 consumers in the group sit permanently idle. The fixed version keys by orderEvent.OrderId, a field that varies across messages, so the hash spreads traffic across the full partition range as intended. This category of bug is particularly easy to introduce when a key is built from a template string, a fixed event-type label, or a default value that was meant as a placeholder during early development and never got replaced.

⚠️ Common Mistake: Copy-pasting a producer example from documentation or a tutorial that uses a literal string as the key for illustration, and shipping it without swapping in an actual per-entity field. Since Kafka doesn't reject a constant key — it's a perfectly valid key value — nothing fails at compile time, deploy time, or even under moderate test load; the collapse only becomes visible once real, higher volume traffic accumulates on the one partition it always hashes to.

Summary and Quick Reference

By now you've moved from thinking of Kafka as "another message queue" to seeing it as a set of partitioned, append-only logs distributed across brokers, with keys acting as the routing input that determines which log a record lands in. Before moving to the deeper guarantees built on top of this model, it's worth locking the vocabulary down cold — these five terms get used loosely in conversation, and imprecise use of them is where a lot of production confusion starts.

The Five Terms, Precisely

Each of these terms describes a different layer of the same system, and mixing them up is the single most common source of confusion when a .NET team first debates ordering or scaling. A topic is the logical name your producers and consumers agree on — it has no inherent order or single physical location of its own. The order and location live one level down, in the partition: an ordered, immutable append-only log that is the actual unit Kafka stores, parallelizes, and routes to. Within a partition, a record's position is its offset, a per-partition integer that has no meaning outside that partition — offset 500 in partition 0 and offset 500 in partition 1 are unrelated records. Physically, each partition lives on a broker as a leader with zero or more replica followers copying it for fault tolerance. And the key is the piece of data attached to a record that the producer's partitioner uses to decide which partition a record goes to.

TermWhat it isWhat it is NOT
📚 TopicA named, logical stream that groups related partitions❌ Not itself ordered, not a single log
🔒 PartitionAn ordered, immutable append-only log; the real unit of storage and parallelism❌ Not shared order across partitions
🎯 OffsetA record's position within one specific partition❌ Not a global sequence number across the topic
🔧 Broker / ReplicaA server hosting a partition's leader or a follower copy for fault tolerance❌ Not a unit of routing or ordering
🧠 KeyThe routing input a producer supplies per record❌ Not required, not a value you consume by default ordering rules

This table is a recap of terms introduced earlier in the lesson, not new material — if any row feels unfamiliar, it's worth revisiting "Anatomy of a Topic: Partitions, Brokers, and Replicas" before continuing, since everything downstream depends on these definitions being precise.

🎯 Key Principle: Ordering, parallelism, and fault tolerance are each owned by a different one of these five concepts — order lives in the partition, parallelism is capped by partition count, and fault tolerance comes from replicas, not from the topic name itself.

The Partitioner Mechanism, Recapped

The routing mechanism itself has exactly two paths, depending on whether a key is present. For a keyed record, the default partitioner computes a hash of the key and reduces it modulo the partition count — hash(key) % numPartitions — which is deterministic for a fixed partition count, meaning the same key always lands on the same partition as long as that count doesn't change. For a record with no key, the client doesn't hash anything; instead it uses a sticky (batch-oriented) partitioner that fills one partition's batch before moving to the next, favoring efficient batching over strict rotation.

Here's a minimal producer snippet that exercises both paths side by side, echoing the worked example from earlier in the lesson but stripped down to just the routing behavior:

using Confluent.Kafka;

var config = new ProducerConfig { BootstrapServers = "localhost:9092" };

using var producer = new ProducerBuilder<string, string>(config).Build();

// Keyed message: hash(key) % partitionCount picks the partition deterministically
var keyedResult = await producer.ProduceAsync("order-events",
    new Message<string, string> { Key = "order-482", Value = "OrderCreated" });
Console.WriteLine($"Keyed message -> partition {keyedResult.Partition.Value}");

// Null-key message: the sticky partitioner assigns a partition per batch, not per hash
var unkeyedResult = await producer.ProduceAsync("order-events",
    new Message<string, string> { Value = "HeartbeatPing" });
Console.WriteLine($"Unkeyed message -> partition {unkeyedResult.Partition.Value}");

Running this twice with the same key will consistently report the same partition number for the keyed message (given a stable partition count), while the unkeyed message's partition depends on the batching state at send time and isn't meant to be predicted from the call site. This is exactly the mechanism covered in depth in "How the Partitioner Assigns Keys to Partitions" — nothing new is being introduced here beyond tying it back into the vocabulary table above.

💡 Mental Model: Think of the key as an address label and the partitioner as a deterministic sorting machine — same label, same bin, every time, as long as the number of bins doesn't change.

The Cost Trade-off and Remapping Risk, in One Line Each

Two operational facts from "Common Pitfalls: Partition Count, Skew, and Remapping" are worth keeping in short-term memory as you design a topic, without re-litigating the reasoning here: more partitions buy you more consumer parallelism but cost broker memory, file handles, and leader-election overhead, so partition count is a trade-off to size deliberately rather than maximize; and because the partitioner's modulo depends on the current partition count, adding partitions later reshuffles hash(key) % numPartitions for every key and silently breaks any prior guarantee that a given key's records were colocated on one partition.

⚠️ Common Mistake: Treating partition count as a value you can freely tune up later the way you'd scale out a stateless web service — for Kafka, changing it retroactively changes routing for every existing key, not just future capacity.

Where the Deeper Guarantees Live

Everything in this lesson has been mechanism and vocabulary: what a partition is, how a key gets mapped to one, and what breaks when partition count changes. Three follow-on lessons build directly on this foundation and answer questions this lesson deliberately left open:

  • Partition Ordering takes the fact that "order lives in the partition" and turns it into a precise guarantee — what Kafka promises about the order consumers see within a partition, and exactly where that promise stops applying across partitions.
  • Key Selection Strategy picks up the routing mechanism recapped above and asks the harder design question: given a domain like order events, what field should actually be the key — order ID, customer ID, something else — and how do you avoid the hot-partition skew that was only named, not solved, in this lesson.
  • Log Retention vs Queue returns to the opening contrast between Kafka's append-only log and a traditional queue like MSMQ or Azure Service Bus, and covers how long records actually persist after being written, since nothing here has addressed retention or deletion policy.

💡 Pro Tip: When you're designing a new topic, work through the vocabulary table above in order — name the topic, decide the partition count with the cost trade-off in mind, pick a key with the next lesson's strategy in mind, and only then write the producer code — rather than defaulting to a partition count and key choice and discovering the consequences after data is already flowing.

Practical Next Steps

With this vocabulary and the partitioner mechanism now consolidated, three concrete actions make good use of it. First, before writing your next producer, explicitly decide whether each message needs a key at all — a null key silently opts into the sticky batching path and gives up any colocation guarantee, which is fine for independent events but wrong for anything that needs per-entity ordering. Second, when you inspect DeliveryResult.Partition in your own code (as shown in the worked example earlier in the lesson), use it as a sanity check that your key choice is actually producing the colocation you expect, rather than assuming it silently. Third, treat partition count as a decision made once with the trade-off above in mind, and if you suspect you'll need to change it, read "Partition Ordering" and "Key Selection Strategy" first, since both directly affect how costly that change will be.