In PerfView, the {{1}} view shows which object types are accumulating over time, while the {{2}} view reveals where those objects are being allocated in code.

["GC Heap Net Mem","GC Heap Alloc Stacks"]

The {{1}} counter shows how frequently full garbage collections occur, and increasing frequency indicates memory {{2}} that prevents efficient collection.

["Gen 2 Collections","pressure"]

Diagnostic Techniques

Tools and approaches to identify retention sources

Diagnostic Techniques for Memory Retention Issues

Master memory retention bug diagnosis in .NET with free flashcards and spaced repetition practice. This lesson covers profiling tools, heap analysis, and event tracing—essential techniques for identifying and resolving memory leaks in managed applications. Understanding these diagnostic approaches will transform you from guessing at memory problems to systematically identifying their root causes.

Welcome to Memory Diagnostics

💻 Memory retention bugs are among the most challenging issues in .NET development. Unlike crashes that provide immediate feedback, memory leaks gradually degrade application performance, often manifesting only in production environments after days or weeks of runtime. The key to solving these issues lies not in luck, but in systematic diagnostic techniques.

In this lesson, you'll learn the professional toolkit for diagnosing retention bugs:

Profiling tools that reveal object lifetimes and retention paths
Heap snapshot analysis to compare memory states
ETW (Event Tracing for Windows) for low-overhead monitoring
Performance counters for real-time memory metrics
Diagnostic patterns that distinguish symptoms from root causes

🎯 By the end of this lesson, you'll be equipped to tackle memory issues that stump less experienced developers.

Core Concepts: The Diagnostic Workflow

The Memory Diagnostic Process

Effective memory diagnostics follow a systematic workflow rather than random tool usage. Think of it like medical diagnosis—you start with symptoms, use tests to gather data, form hypotheses, and validate them:

┌─────────────────────────────────────────┐
│     MEMORY DIAGNOSTIC WORKFLOW          │
└─────────────────────────────────────────┘

    📊 Observe Symptoms
    (high memory, slow GC)
           │
           ↓
    📈 Measure Baselines
    (heap size, GC frequency)
           │
           ↓
    🔍 Capture Snapshots
    (before/after scenarios)
           │
           ↓
    🧩 Analyze Differences
    (new objects, retained paths)
           │
           ↓
    💡 Form Hypothesis
    (event handler leak?)
           │
           ↓
    ✅ Validate & Fix
    (reproduce, verify fix)
           │
           ↓
    🔄 Monitor Production
    (confirm resolution)

Essential Diagnostic Tools

.NET provides multiple tools, each with specific strengths:

Tool	Best For	Overhead	Environment
Visual Studio Profiler	Development analysis, detailed object graphs	High	Dev only
dotMemory	Deep heap analysis, retention paths	Medium-High	Dev/Staging
PerfView	ETW traces, GC events, production diagnosis	Low	All environments
Performance Counters	Real-time monitoring, alerting	Very Low	All environments
Debug Diagnostic Tool	Memory dump analysis, crash investigation	None (offline)	Production dumps

💡 Pro Tip: Start with low-overhead tools in production (Performance Counters, PerfView), then use high-detail tools in development to investigate specific issues.

Performance Counters: Your First Alert System

Performance counters provide real-time metrics without requiring debugger attachment. Key counters for memory diagnostics:

📊 Critical .NET Memory Counters

# Bytes in all Heaps	Total managed heap size (watch for steady growth)
Gen 2 Collections	Full GC frequency (expensive, should be rare)
% Time in GC	GC overhead (>10% indicates problems)
Large Object Heap size	LOH growth (>85KB objects, never compacted)
Allocated Bytes/sec	Allocation rate (high = pressure on GC)

🔧 Try this: Open Performance Monitor (perfmon.exe), add .NET CLR Memory counters for your process, and watch them during normal operation to establish baselines.

Heap Snapshots: Comparing Memory States

The most powerful technique for finding retention bugs is heap snapshot comparison. This works by:

Taking snapshot #1 before the suspected leaking operation
Performing the operation (e.g., opening/closing a form)
Taking snapshot #2 after the operation
Comparing snapshots to see what objects remained

⚠️ Critical insight: The leaked objects themselves are often not the problem—it's what's holding onto them that matters. This is where retention paths become essential.

┌────────────────────────────────────────┐
│    RETENTION PATH EXAMPLE              │
└────────────────────────────────────────┘

  Static Field (GC Root)
         │
         ↓
  EventManager instance
         │
         │ _subscribers List<>
         ↓
  EventHandler delegate
         │
         │ .Target
         ↓
  UserControl instance  ← LEAKED!
         │
         ↓
  Bitmap (10 MB)       ← WASTED MEMORY

In this example, the UserControl should have been collected, but an event subscription keeps it alive. The profiler shows you this entire chain, revealing that unsubscribing from the event would fix the leak.

ETW and PerfView: Production-Safe Diagnostics

Event Tracing for Windows (ETW) provides kernel-level instrumentation with minimal overhead. PerfView is Microsoft's free tool for capturing and analyzing ETW traces.

Why PerfView is powerful for production:

~2-5% overhead vs. 30-100% for traditional profilers
No process attachment required until analysis
Captures GC events showing what triggers collections
Shows allocation stacks revealing where objects are created
Thread contention data for performance issues

🧠 Memory device: PerfView = Production-safe, Powerful, Precise

Key PerfView commands for memory analysis:

// Collect heap snapshot
PerfView /AcceptEULA collect -GCOnly -AcceptEula

// Collect with allocation stacks (higher overhead)
PerfView collect -GCCollectOnly -MaxCollectSec:300

// Analyze existing trace file
PerfView GCStats myapp.etl.zip

Understanding GC Events in Traces

ETW traces reveal why the GC runs, not just when. The GC Reason field is crucial:

GC Reason	Meaning	Action
AllocSmall	Gen 0 budget exhausted (normal)	None needed if infrequent
AllocLarge	LOH allocation needed	Reduce >85KB allocations
Induced	Code called GC.Collect()	Remove unnecessary calls
OutOfMemory	Memory pressure critical	Urgent: investigate retention
HighMemory	System memory low	Reduce working set size

🔍 Pattern to watch: Frequent Gen 2 collections with "OutOfMemory" reason indicates a retention bug, not just high allocation rate.

Detailed Examples

Example 1: Diagnosing an Event Handler Leak

Scenario: A WPF application's memory grows by ~50MB every time a settings dialog opens and closes.

Diagnostic steps:

Step	Action	Tool	Finding
1	Establish baseline memory	Performance Counters	App starts at 120MB
2	Open/close dialog 5 times	Manual testing	Memory grows to 370MB
3	Force full GC	GC.Collect(2, GCCollectionMode.Forced)	Memory stays at 350MB
4	Take heap snapshot	dotMemory	5 SettingsDialog instances retained
5	Analyze retention paths	dotMemory Key Retention	ConfigManager event holds references
6	Inspect event subscription code	Code review	Missing -= on dialog close

The problematic code:

public class SettingsDialog : Window
{
    public SettingsDialog()
    {
        InitializeComponent();
        
        // Subscribe to static event
        ConfigManager.Instance.ConfigChanged += OnConfigChanged;
        
        // ❌ PROBLEM: Never unsubscribes!
        // The static ConfigManager holds a reference to this dialog
    }
    
    private void OnConfigChanged(object sender, EventArgs e)
    {
        // Update UI with new config
    }
}

The fix:

public class SettingsDialog : Window
{
    public SettingsDialog()
    {
        InitializeComponent();
        ConfigManager.Instance.ConfigChanged += OnConfigChanged;
    }
    
    protected override void OnClosed(EventArgs e)
    {
        // ✅ Unsubscribe to break the reference
        ConfigManager.Instance.ConfigChanged -= OnConfigChanged;
        base.OnClosed(e);
    }
    
    private void OnConfigChanged(object sender, EventArgs e)
    {
        // Update UI with new config
    }
}

💡 Diagnostic lesson: When memory doesn't decrease after GC, you have a retention bug, not an allocation problem. Heap snapshots reveal the retained objects, and retention paths show why they're retained.

Example 2: Using PerfView for Production Investigation

Scenario: A web API's memory grows steadily in production, but the issue doesn't reproduce in development.

Production-safe diagnostic approach:

## Step 1: Capture baseline ETW trace (15 minutes, low overhead)
PerfView /AcceptEULA /MaxCollectSec:900 /Zip:true collect

## Step 2: Let app run through typical load
## (The /MaxCollectSec:900 automatically stops after 15 minutes)

## Step 3: Download the .etl.zip file from production
## Step 4: Analyze locally in PerfView

In PerfView, key views to examine:

GCStats: Shows GC frequency, heap growth, generation sizes
- Look for: Gen 2 heap size steadily increasing
- Look for: Increasing time in GC
GC Heap Net Mem: Shows which types are accumulating
- Sort by "Diff" column after comparing two snapshots
- Focus on types with large positive diff
GC Heap Alloc Stacks: Shows where objects are allocated
- Double-click on accumulating type
- Walk up call stack to find allocation source

Example findings from PerfView:

┌─────────────────────────────────────────────┐
│  TOP ACCUMULATING TYPES (over 15 min)      │
├─────────────────────────────────────────────┤
│ Type                    Count    Size (MB)  │
├─────────────────────────────────────────────┤
│ System.Byte[]          142,853      2,847   │
│ MyApp.RequestContext    35,671        571   │
│ System.String          298,442        387   │
│ MyApp.SessionData       12,234        293   │
└─────────────────────────────────────────────┘

Notice RequestContext and SessionData are both custom types accumulating significantly. The allocation stacks reveal:

MyApp.SessionData allocations:
  │
  └─ SessionManager.CreateSession
      │
      └─ SessionCache.Add(sessionId, sessionData)
           │
           └─ Dictionary<string, SessionData>.Add

Root cause discovered: Sessions were added to the cache but never removed. The cache had no expiration policy!

The fix:

public class SessionCache
{
    private readonly ConcurrentDictionary<string, SessionEntry> _cache = new();
    
    // ✅ Add expiration policy
    public void Add(string sessionId, SessionData data)
    {
        var entry = new SessionEntry
        {
            Data = data,
            ExpiresAt = DateTime.UtcNow.AddMinutes(20)
        };
        _cache[sessionId] = entry;
        
        // Clean up expired entries periodically
        CleanupExpired();
    }
    
    private void CleanupExpired()
    {
        var now = DateTime.UtcNow;
        var expiredKeys = _cache
            .Where(kvp => kvp.Value.ExpiresAt < now)
            .Select(kvp => kvp.Key)
            .ToList();
            
        foreach (var key in expiredKeys)
        {
            _cache.TryRemove(key, out _);
        }
    }
}

💡 Diagnostic lesson: Production bugs often stem from load patterns that don't occur in development. PerfView's low overhead makes it safe to run in production, revealing issues that only manifest under real traffic.

Example 3: Analyzing Heap Snapshots with Visual Studio

Scenario: A Windows Service's memory grows during batch processing.

Using Visual Studio Profiler:

Attach debugger to running service: Debug → Attach to Process
Take snapshot #1: Debug → Performance Profiler → .NET Object Allocation
Let batch process run (process 1000 records)
Take snapshot #2
Take snapshot #3 after another 1000 records
Compare snapshots #2 and #3

Visual Studio shows:

┌────────────────────────────────────────────┐
│  SNAPSHOT COMPARISON #2 → #3              │
├────────────────────────────────────────────┤
│ Type                Objects    Size Diff   │
├────────────────────────────────────────────┤
│ RecordProcessor    +1,000     +48 MB       │
│ XmlDocument        +1,000     +156 MB      │
│ MemoryStream       +2,000     +89 MB       │
│ StringBuilder      +5,000     +12 MB       │
└────────────────────────────────────────────┘

The problem: 1,000 RecordProcessor instances are being retained, each holding a large XmlDocument.

Inspecting retention paths:

GC Root: ProcessorRegistry (static field)
  │
  └─ Dictionary<Guid, RecordProcessor> _activeProcessors
       │
       └─ [1000 entries, never removed]
            │
            └─ RecordProcessor instance
                 │
                 └─ XmlDocument _recordData (156 KB each)

Root cause: Processors were registered but never unregistered after completion.

The fix:

public class BatchService
{
    public async Task ProcessBatchAsync(IEnumerable<Record> records)
    {
        foreach (var record in records)
        {
            var processor = new RecordProcessor(record);
            var processorId = Guid.NewGuid();
            
            // ❌ OLD: Register and forget
            // ProcessorRegistry.Register(processorId, processor);
            
            // ✅ NEW: Register, process, unregister
            ProcessorRegistry.Register(processorId, processor);
            try
            {
                await processor.ProcessAsync();
            }
            finally
            {
                ProcessorRegistry.Unregister(processorId);
            }
        }
    }
}

Even better, use a pattern that doesn't require manual cleanup:

public class BatchService
{
    public async Task ProcessBatchAsync(IEnumerable<Record> records)
    {
        // ✅ BEST: Use temporary collection, no global registry
        var processors = records
            .Select(r => new RecordProcessor(r))
            .ToList();
            
        await Task.WhenAll(processors.Select(p => p.ProcessAsync()));
        
        // All processors eligible for GC when method completes
    }
}

💡 Diagnostic lesson: Snapshot comparison reveals what's accumulating. Retention path analysis reveals why. Together, they point directly to the leak source.

Example 4: Identifying LOH Fragmentation

Scenario: Application experiences occasional OutOfMemory exceptions despite total memory usage being reasonable.

The counter-intuitive symptom: Performance Monitor shows:

Total heap size: 850 MB (plenty of RAM available)
Application crashes with OutOfMemoryException
Gen 2 collections are frequent

Diagnostic approach:

// Check LOH fragmentation programmatically
var gcMemoryInfo = GC.GetGCMemoryInfo();
var lohSize = gcMemoryInfo.GenerationInfo[3].SizeAfterBytes; // Gen 3 = LOH
var lohFragmented = gcMemoryInfo.FragmentedBytes;

Console.WriteLine($"LOH Size: {lohSize / 1024 / 1024} MB");
Console.WriteLine($"LOH Fragmented: {lohFragmented / 1024 / 1024} MB");
Console.WriteLine($"Fragmentation %: {(double)lohFragmented / lohSize * 100:F1}%");

Output reveals the problem:

LOH Size: 645 MB
LOH Fragmented: 412 MB
Fragmentation %: 63.9%

What's happening: The Large Object Heap (objects >85KB) never gets compacted. When your code allocates and releases large objects in varying sizes, you create "holes" in the LOH. Eventually, even though there's free memory, there's no contiguous space for a large allocation.

LOH FRAGMENTATION VISUALIZATION

┌────────────────────────────────────────┐
│ Initial state (3 large objects)        │
├────────────────────────────────────────┤
│ [   Object A   ][  Object B  ][ Obj C ]│
│      200KB          150KB       100KB  │
└────────────────────────────────────────┘

┌────────────────────────────────────────┐
│ Object B released (hole created)       │
├────────────────────────────────────────┤
│ [   Object A   ][   FREE    ][ Obj C ] │
│      200KB         150KB       100KB   │
└────────────────────────────────────────┘

┌────────────────────────────────────────┐
│ Try to allocate 180KB object → FAILS!  │
├────────────────────────────────────────┤
│ [   Object A   ][   FREE    ][ Obj C ] │
│      200KB     ↑  150KB  ↑    100KB   │
│                └─ Too small! ─┘        │
│ OutOfMemoryException despite free space│
└────────────────────────────────────────┘

Using PerfView to diagnose:

Open the .etl trace in PerfView
Navigate to GC Heap Net Mem (Coarse) Stacks
Filter for size >85,000 bytes
Identify which allocations are large

Common LOH allocation sources:

Type	Typical Size	Solution
`byte[]` for buffers	1MB+	Use ArrayPool
`string` concatenation	85KB+	StringBuilder or string.Create
`Bitmap` images	Varies	Dispose promptly, reuse when possible
`List` capacity	When T is large	Pre-size or use LinkedList

The fix for our scenario (large byte[] buffers):

// ❌ OLD: Creates new 1MB array each time (goes to LOH)
public byte[] ProcessData(Stream input)
{
    var buffer = new byte[1024 * 1024]; // 1 MB buffer
    input.Read(buffer, 0, buffer.Length);
    return TransformData(buffer);
}

// ✅ NEW: Rent from pool, return when done
public byte[] ProcessData(Stream input)
{
    var buffer = ArrayPool<byte>.Shared.Rent(1024 * 1024);
    try
    {
        input.Read(buffer, 0, 1024 * 1024);
        return TransformData(buffer);
    }
    finally
    {
        ArrayPool<byte>.Shared.Return(buffer);
    }
}

💡 Diagnostic lesson: Not all memory problems are leaks. LOH fragmentation causes OOM even with available memory. Look for large allocations and eliminate them through pooling or size reduction.

Common Mistakes in Memory Diagnostics

⚠️ Mistake #1: Only checking total memory usage

Why it's wrong: Total memory tells you if there's a problem, not what the problem is.

What to do instead: Track rate of change and GC behavior. A steady 500MB might be fine, but growing 10MB/hour indicates a leak.

⚠️ Mistake #2: Calling GC.Collect() to "fix" memory issues

Why it's wrong: Forcing GC masks symptoms without addressing root causes. If memory returns after GC, great—you had an allocation rate problem. If it doesn't return, you have a retention bug that needs fixing.

What to do instead: Use GC.Collect() only as a diagnostic tool to distinguish allocation from retention:

// Diagnostic pattern
var beforeGC = GC.GetTotalMemory(false);
GC.Collect(2, GCCollectionMode.Forced);
GC.WaitForPendingFinalizers();
var afterGC = GC.GetTotalMemory(true);

if ((beforeGC - afterGC) < beforeGC * 0.1)
{
    Console.WriteLine("⚠️ Retention bug: GC only freed 10%");
}

⚠️ Mistake #3: Ignoring finalization queue depth

Why it's wrong: Objects with finalizers (destructors in C#) go through a two-stage collection process. A backlog in the finalization queue appears as a memory leak.

What to do instead: Check the finalization queue:

var gen0 = GC.CollectionCount(0);
var gen1 = GC.CollectionCount(1);
var gen2 = GC.CollectionCount(2);

// If Gen 2 collections are frequent but memory isn't freed,
// check for objects waiting for finalization

Use PerfView's GC/Finalizer Queue Length graph to spot finalization backlogs.

Common causes:

Finalizers that block or take too long
Too many finalizable objects being created
Deadlock in finalizer code

⚠️ Mistake #4: Taking snapshots too frequently

Why it's wrong: Heap snapshots pause the application and consume significant memory themselves. Taking snapshots every minute creates performance problems.

What to do instead: Use strategic snapshot timing:

Baseline (app idle)
After suspected leak operation
After second occurrence (to confirm pattern)
Final snapshot for comparison

⚠️ Mistake #5: Focusing on small objects instead of retention paths

Why it's wrong: Finding that you have "10,000 string objects" doesn't tell you why they're retained or which ones are problematic.

What to do instead:

Sort by total size, not count
Focus on your types, not framework types
Examine retention paths for your types
Find the GC root (static field, active thread) holding the chain

⚠️ Mistake #6: Not establishing baselines

Why it's wrong: Without knowing normal behavior, you can't identify abnormal behavior. Is 400MB memory usage good or bad? Depends on your baseline.

What to do instead: Record baseline metrics:

Memory at application start
Memory after typical operations
Typical GC collection frequency
Expected working set size

Use these to identify deviations that indicate problems.

Key Takeaways

🎯 Core Diagnostic Principles:

Symptoms vs. root causes: High memory is a symptom. The root cause is what's retaining objects unnecessarily.
Systematic workflow: Observe → Measure → Compare → Analyze → Hypothesize → Validate → Fix
Tool selection matters: Use low-overhead tools (PerfView, perfmon) in production, high-detail tools (dotMemory, VS Profiler) in development.
Retention paths reveal everything: Finding what objects are retained is less important than finding what's holding them.
GC behavior tells a story: Frequent Gen 2 collections, high GC time %, and increasing heap size are red flags.
LOH is special: Large objects (>85KB) never get compacted, leading to fragmentation. Pool large buffers.
Event handlers are common culprits: Subscriptions to static events or long-lived objects create retention chains.

🔍 Diagnostic Checklist:

Establish baseline metrics (memory, GC frequency)
Monitor performance counters continuously
Take strategic heap snapshots (before/after)
Compare snapshots to find accumulating types
Analyze retention paths to GC roots
Verify hypothesis by reproducing the pattern
Implement fix and confirm resolution
Monitor production to validate

💡 Remember: The best diagnostic tool is systematic thinking. Tools provide data; your analysis provides solutions.

📋 Quick Reference Card: Memory Diagnostic Commands

Check total memory	`GC.GetTotalMemory(false)`
Check GC collections	`GC.CollectionCount(0/1/2)`
Get GC info	`GC.GetGCMemoryInfo()`
Force GC (diagnostic)	`GC.Collect(2, GCCollectionMode.Forced)`
PerfView capture	`PerfView collect -GCOnly`
Rent from pool	`ArrayPool<T>.Shared.Rent(size)`
Return to pool	`ArrayPool<T>.Shared.Return(array)`

📚 Further Study

Microsoft Docs - Memory Performance Best Practices: https://learn.microsoft.com/en-us/dotnet/standard/garbage-collection/memory-management-and-gc
PerfView Tutorial and Documentation: https://github.com/microsoft/perfview/blob/main/documentation/Tutorial.md
JetBrains dotMemory Profiling Guide: https://www.jetbrains.com/help/dotmemory/Profiling_Guidelines.html

🎓 Next Steps: Now that you've mastered diagnostic techniques, practice applying them to real scenarios. Set up monitoring in your applications, establish baselines, and proactively look for memory patterns before they become production issues. The skills you've learned here transform reactive firefighting into proactive engineering excellence.

📝

Ready to practice?

This lesson has 15 questions to help you learn