You are viewing a preview of this lesson. Sign in to start learning
Back to Mastering Memory Management and Garbage Collection in .NET

Modern Allocation Primitives

Span<T> and Memory<T> for stack-like and heap-safe buffer handling

Modern Allocation Primitives in .NET

Modern .NET memory allocation has evolved far beyond simple new operators with free flashcards and spaced repetition practice to help you master these concepts. This lesson covers stack allocation techniques, span-based primitives, and pooling strategiesβ€”essential knowledge for building high-performance .NET applications that minimize garbage collection pressure.

Welcome to Advanced Memory Allocation πŸš€

Welcome to one of the most powerful aspects of modern .NET development! As .NET has matured through versions 5, 6, 7, and 8, Microsoft has introduced sophisticated allocation primitives that allow developers to write code approaching the performance of native languages while maintaining C#'s safety guarantees.

Modern allocation primitives are the low-level building blocks that enable you to control exactly where and how memory is allocated. Instead of always allocating on the managed heap (triggering garbage collection), you can now leverage stack allocation, memory pooling, and zero-copy techniques to dramatically improve performance in hot code paths.

πŸ’‘ Why does this matter? Every heap allocation creates work for the garbage collector. In high-throughput systems (web servers, game engines, financial trading platforms), millions of allocations per second can cause GC pauses that degrade user experience. Modern primitives help you avoid these allocations entirely.

Core Concepts: The Allocation Hierarchy πŸ“Š

The Three Memory Regions

Understanding where memory lives is fundamental to choosing the right allocation primitive:

Region Characteristics When to Use Performance
Stack Automatic cleanup, extremely fast, limited size (~1MB) Small, short-lived data ⚑ Fastest
Heap (Managed) GC-managed, unlimited size, allocation overhead Objects, long-lived data 🐒 Slower (GC cost)
Unmanaged Manual management, no GC, interop-friendly Native interop, large buffers ⚑ Fast (but risky)
MEMORY ALLOCATION DECISION TREE

                β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                β”‚ Need to allocate?  β”‚
                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                          β”‚
         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
         β”‚                                 β”‚
    β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”                      β”Œβ”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”
    β”‚ Size    β”‚                      β”‚ Size      β”‚
    β”‚ ≀ 512B? β”‚                      β”‚ > 512B?   β”‚
    β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜                      β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
         β”‚                                 β”‚
         ↓                                 ↓
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”                      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚ Use     β”‚                      β”‚ Lifetime?β”‚
    β”‚ stacka  β”‚                      β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜
    β”‚ lloc    β”‚                            β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                             β”‚                        β”‚
                        β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”            β”Œβ”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”
                        β”‚ Short   β”‚            β”‚ Long/       β”‚
                        β”‚ (<1 req)β”‚            β”‚ Unknown     β”‚
                        β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜            β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
                             β”‚                        β”‚
                             ↓                        ↓
                      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                      β”‚ Use      β”‚            β”‚ Use heap β”‚
                      β”‚ ArrayPoolβ”‚            β”‚ (new)    β”‚
                      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Span and Memory: The Foundation πŸ’Ž

Span is a ref struct that provides type-safe, memory-safe access to contiguous memory regionsβ€”whether that memory is on the stack, heap, or native memory. Think of it as a "view" over memory that doesn't itself allocate.

Key properties of Span:

  • Ref struct: Can only live on the stack (cannot be boxed, stored in fields of classes, or used in async methods)
  • Zero-allocation: Creating a Span over existing memory doesn't allocate
  • Bounds-checked: Safe indexing prevents buffer overruns
  • Performance: JIT compiler optimizes Span operations to native performance

Memory is Span's heap-friendly cousin:

  • Not a ref struct (can be stored in fields, used in async methods)
  • Slightly more overhead than Span
  • Can be sliced and passed around without lifetime restrictions
SPAN VS MEMORY COMPARISON

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Span ⚑        β”‚         β”‚   Memory πŸ“¦      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€         β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ βœ“ Stack-only       β”‚         β”‚ βœ“ Heap-friendly     β”‚
β”‚ βœ“ Zero overhead    β”‚         β”‚ βœ“ Async-compatible  β”‚
β”‚ βœ“ Fastest          β”‚         β”‚ βœ“ Storable in class β”‚
β”‚ βœ— No async/await   β”‚         β”‚ βœ— Slight overhead   β”‚
β”‚ βœ— No class fields  β”‚         β”‚ .Span β†’ Get Span β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Stackalloc: Lightning-Fast Stack Allocation ⚑

stackalloc allocates memory directly on the stack, bypassing the heap and garbage collector entirely. It's one of the most powerful tools for zero-allocation code.

Syntax evolution:

// Old way (unsafe context required)
unsafe {
    int* numbers = stackalloc int[10];
}

// Modern way (safe with Span)
Span<int> numbers = stackalloc int[10];

Critical constraints:

  • Memory automatically deallocated when method returns
  • Limited by stack size (~1MB on Windows, ~8MB on Linux)
  • Best for small, temporary buffers (≀ 512 bytes recommended)
  • Cannot return stackalloc'd memory from a method

πŸ’‘ Pro tip: Use stackalloc for temporary buffers in hot loops. A buffer of 128 bytes allocated a million times per second would create 128MB/sec of garbage without stackallocβ€”with it, zero garbage!

⚠️ Warning: Stack overflow crashes are unrecoverable! Always validate buffer sizes before using stackalloc, especially with user input.

ArrayPool: Reusable Buffer Management πŸ”„

When you need larger buffers (> 512 bytes) or buffers whose lifetime extends beyond a single method, ArrayPool is your friend. It maintains pools of reusable arrays to eliminate allocation churn.

How it works:

  1. Rent an array from the pool (may get larger than requested)
  2. Use the array
  3. Return it to the pool (now available for reuse)

Two pool types:

  • ArrayPool<T>.Shared: Thread-safe global pool (most common)
  • ArrayPool<T>.Create(): Custom pool with specific size limits
ARRAYPOOL LIFECYCLE

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  ArrayPool.Shared                     β”‚
β”‚                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”              β”‚
β”‚  β”‚1024β”‚ β”‚1024β”‚ β”‚2048β”‚ β”‚2048β”‚ (Available)   β”‚
β”‚  β””β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”˜              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚ Rent(1000)
               ↓
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚ Your Codeβ”‚ (Using 1024-byte array)
        β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”˜
               β”‚ Return()
               ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  β”Œβ”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”       β”‚
β”‚  β”‚1024β”‚ β”‚1024β”‚ β”‚1024β”‚ β”‚2048β”‚ β”‚2048β”‚       β”‚
β”‚  β””β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”˜       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Best practices:

  • Always return arrays to the pool (use try-finally)
  • Don't hold references after returning
  • Clear sensitive data before returning
  • The pool may give you a larger array than requested

MemoryPool: Flexible Memory Management 🎯

MemoryPool is similar to ArrayPool but works with Memory and provides more flexibility:

  • Returns IMemoryOwner<T> which implements IDisposable
  • Better for async scenarios (Memory is async-compatible)
  • Integrates with pipelines and stream-based APIs
  • Supports custom allocators
using IMemoryOwner<byte> owner = MemoryPool<byte>.Shared.Rent(1024);
Memory<byte> memory = owner.Memory;
// Use memory...
// Automatically returned when disposed

Examples: Real-World Applications πŸ”§

Example 1: High-Performance String Parsing

Problem: Parse thousands of CSV lines per second without creating garbage.

Traditional approach (allocates heavily):

public void ParseCSV(string line) {
    string[] parts = line.Split(','); // Allocates string array + strings
    int id = int.Parse(parts[0]);     // Allocates substring
    string name = parts[1];
    decimal price = decimal.Parse(parts[2]);
    // Process...
}

Modern zero-allocation approach:

public void ParseCSV(ReadOnlySpan<char> line) {
    // Split without allocation
    int firstComma = line.IndexOf(',');
    int secondComma = line.Slice(firstComma + 1).IndexOf(',') + firstComma + 1;
    
    // Parse directly from spans (no substring allocation)
    ReadOnlySpan<char> idSpan = line.Slice(0, firstComma);
    ReadOnlySpan<char> nameSpan = line.Slice(firstComma + 1, secondComma - firstComma - 1);
    ReadOnlySpan<char> priceSpan = line.Slice(secondComma + 1);
    
    int id = int.Parse(idSpan);
    // Use nameSpan directly or convert only if needed
    decimal price = decimal.Parse(priceSpan);
    // Process...
}

Performance impact: The modern approach allocates zero bytes per line. Processing 1 million lines saves ~200MB of allocations and eliminates GC pauses.

Example 2: Temporary Buffer with Smart Sizing

Problem: Need a buffer for processing, but size varies. Want to avoid allocation for small cases.

public void ProcessData(ReadOnlySpan<byte> input) {
    const int StackAllocThreshold = 256;
    
    // Smart allocation: stack for small, pool for large
    Span<byte> buffer = input.Length <= StackAllocThreshold
        ? stackalloc byte[StackAllocThreshold]
        : new byte[input.Length];
    
    // Alternative with ArrayPool for large buffers
    byte[]? rented = null;
    try {
        Span<byte> bufferAlt = input.Length <= StackAllocThreshold
            ? stackalloc byte[StackAllocThreshold]
            : (rented = ArrayPool<byte>.Shared.Rent(input.Length));
        
        // Process data in buffer...
        Transform(input, bufferAlt);
        
    } finally {
        if (rented != null) {
            ArrayPool<byte>.Shared.Return(rented);
        }
    }
}

Why this pattern works:

  • Small inputs (≀256 bytes): Zero-allocation stack path
  • Large inputs: Pooled allocation reuses memory
  • Automatic fallback ensures correctness

🧠 Memory device: Think "Stack for snacks, Pool for meals" - quick snacks on the stack, full meals need the pool!

Example 3: UTF-8 String Encoding Without Allocation

Problem: Convert strings to UTF-8 bytes for network transmission efficiently.

public void SendMessage(string message, Socket socket) {
    // Calculate exact byte count needed
    int maxByteCount = Encoding.UTF8.GetMaxByteCount(message.Length);
    
    // Use stack for small messages, pool for large
    Span<byte> buffer = maxByteCount <= 512
        ? stackalloc byte[512]
        : ArrayPool<byte>.Shared.Rent(maxByteCount);
    
    try {
        // Encode directly to span (zero-copy)
        int bytesWritten = Encoding.UTF8.GetBytes(message.AsSpan(), buffer);
        
        // Send only the bytes actually used
        socket.Send(buffer.Slice(0, bytesWritten));
        
    } finally {
        if (maxByteCount > 512) {
            ArrayPool<byte>.Shared.Return(buffer.ToArray());
        }
    }
}

Performance optimization: Traditional Encoding.UTF8.GetBytes(string) allocates a new byte array every time. This approach reuses memory, critical for high-throughput servers.

Example 4: Memory in Async Pipelines

Problem: Process streaming data asynchronously with minimal allocations.

public async Task ProcessStreamAsync(Stream input) {
    // Rent reusable buffer for entire pipeline
    using IMemoryOwner<byte> owner = MemoryPool<byte>.Shared.Rent(4096);
    Memory<byte> buffer = owner.Memory;
    
    int bytesRead;
    while ((bytesRead = await input.ReadAsync(buffer)) > 0) {
        // Process chunk
        Memory<byte> chunk = buffer.Slice(0, bytesRead);
        await ProcessChunkAsync(chunk);
        
        // Buffer automatically reused for next iteration
    }
    // Buffer returned to pool on dispose
}

private async Task ProcessChunkAsync(Memory<byte> data) {
    // Memory<T> can be used across await boundaries
    await Task.Delay(10); // Simulated async work
    
    // Access the data after await (safe with Memory<T>)
    Span<byte> span = data.Span;
    // Process span...
}

Why Memory here: Span cannot be used across await points because it's stack-only. Memory solves this while maintaining efficiency.

Common Mistakes ⚠️

Mistake 1: Stack Overflow from Large stackalloc

❌ Wrong:

public void Process(int size) {
    Span<byte> buffer = stackalloc byte[size]; // size could be huge!
    // Stack overflow crash if size > ~1MB
}

βœ… Right:

public void Process(int size) {
    const int MaxStackSize = 512;
    byte[]? rented = null;
    
    Span<byte> buffer = size <= MaxStackSize
        ? stackalloc byte[MaxStackSize]
        : (rented = ArrayPool<byte>.Shared.Rent(size));
    
    try {
        // Safe processing
    } finally {
        if (rented != null) ArrayPool<byte>.Shared.Return(rented);
    }
}

Mistake 2: Forgetting to Return Arrays to Pool

❌ Wrong:

public void Process() {
    byte[] buffer = ArrayPool<byte>.Shared.Rent(1024);
    // Use buffer...
    // Forgot to return! Memory leak in pool
}

βœ… Right:

public void Process() {
    byte[] buffer = ArrayPool<byte>.Shared.Rent(1024);
    try {
        // Use buffer...
    } finally {
        ArrayPool<byte>.Shared.Return(buffer, clearArray: true);
    }
}

πŸ’‘ Pro tip: Set clearArray: true when returning buffers that held sensitive data (passwords, keys).

Mistake 3: Using Span in Async Methods

❌ Wrong:

public async Task ProcessAsync(Span<byte> data) { // Won't compile!
    await Task.Delay(100);
    // Span can't live across await
}

βœ… Right:

public async Task ProcessAsync(Memory<byte> data) {
    await Task.Delay(100);
    Span<byte> span = data.Span; // Get span after await
    // Process span...
}

Mistake 4: Assuming Pool Arrays Are Zeroed

❌ Wrong:

byte[] buffer = ArrayPool<byte>.Shared.Rent(100);
// Assumption: all bytes are 0
if (buffer[50] == 0) { // Might be leftover data!
    // Dangerous!
}

βœ… Right:

byte[] buffer = ArrayPool<byte>.Shared.Rent(100);
buffer.AsSpan(0, 100).Clear(); // Explicitly zero if needed
// Now safe to assume zeros

πŸ” Did you know? ArrayPool doesn't zero arrays by default for performance. This means rented arrays may contain data from previous uses!

Mistake 5: Storing Span in Class Fields

❌ Wrong:

public class DataProcessor {
    private Span<byte> _buffer; // Won't compile! Span can't be a field
}

βœ… Right:

public class DataProcessor {
    private Memory<byte> _buffer; // Memory<T> can be stored
    
    public void Process() {
        Span<byte> span = _buffer.Span; // Get Span when needed
    }
}

Key Takeaways 🎯

πŸ“‹ Quick Reference Card: Modern Allocation Primitives

Primitive Best For Key Constraint Performance
stackalloc Small (≀512B), short-lived buffers Stack-only, size limits ⚑ Fastest
Span<T> Zero-copy views, sync code No async, no class fields ⚑ Zero overhead
Memory<T> Async operations, storable Slight overhead vs Span πŸš€ Fast
ArrayPool<T> Medium/large reusable buffers Must return, not zeroed πŸš€ Good (reuse)
MemoryPool<T> Async pipelines, IDisposable Slightly heavier πŸš€ Good

🧠 Decision Flowchart:

Size ≀ 512B? β†’ Yes β†’ stackalloc + Span
Size ≀ 512B? β†’ No β†’ Need async? β†’ Yes β†’ MemoryPool + Memory
Size ≀ 512B? β†’ No β†’ Need async? β†’ No β†’ ArrayPool + Span

⚠️ Golden Rules:

  • Never stackalloc with untrusted sizes
  • Always return pooled arrays (use try-finally)
  • Use Span for sync, Memory for async
  • Clear sensitive data before returning to pools
  • Measure before optimizing - these primitives add complexity

πŸ“š Further Study

  1. Official Microsoft Documentation on Span: https://learn.microsoft.com/en-us/dotnet/api/system.span-1

    • Comprehensive reference including performance characteristics and usage patterns
  2. Memory Management in .NET (Microsoft Docs): https://learn.microsoft.com/en-us/dotnet/standard/automatic-memory-management

    • Deep dive into how the GC works and why these primitives help
  3. High-Performance .NET by Example: https://github.com/adamsitnik/awesome-dot-net-performance

    • Community-curated list of performance resources and real-world examples

Congratulations! πŸŽ‰ You now understand the modern allocation primitives that power high-performance .NET applications. These toolsβ€”stackalloc, Span, Memory, ArrayPool, and MemoryPoolβ€”form the foundation of zero-allocation programming. Practice using them in your hot code paths, measure the impact with BenchmarkDotNet, and watch your application's memory pressure drop dramatically. Remember: the best allocation is the one you never make!