Diagnostic Techniques
Tools and approaches to identify retention sources
Diagnostic Techniques for Memory Retention Issues
Master memory retention bug diagnosis in .NET with free flashcards and spaced repetition practice. This lesson covers profiling tools, heap analysis, and event tracingβessential techniques for identifying and resolving memory leaks in managed applications. Understanding these diagnostic approaches will transform you from guessing at memory problems to systematically identifying their root causes.
Welcome to Memory Diagnostics
π» Memory retention bugs are among the most challenging issues in .NET development. Unlike crashes that provide immediate feedback, memory leaks gradually degrade application performance, often manifesting only in production environments after days or weeks of runtime. The key to solving these issues lies not in luck, but in systematic diagnostic techniques.
In this lesson, you'll learn the professional toolkit for diagnosing retention bugs:
- Profiling tools that reveal object lifetimes and retention paths
- Heap snapshot analysis to compare memory states
- ETW (Event Tracing for Windows) for low-overhead monitoring
- Performance counters for real-time memory metrics
- Diagnostic patterns that distinguish symptoms from root causes
π― By the end of this lesson, you'll be equipped to tackle memory issues that stump less experienced developers.
Core Concepts: The Diagnostic Workflow
The Memory Diagnostic Process
Effective memory diagnostics follow a systematic workflow rather than random tool usage. Think of it like medical diagnosisβyou start with symptoms, use tests to gather data, form hypotheses, and validate them:
βββββββββββββββββββββββββββββββββββββββββββ
β MEMORY DIAGNOSTIC WORKFLOW β
βββββββββββββββββββββββββββββββββββββββββββ
π Observe Symptoms
(high memory, slow GC)
β
β
π Measure Baselines
(heap size, GC frequency)
β
β
π Capture Snapshots
(before/after scenarios)
β
β
π§© Analyze Differences
(new objects, retained paths)
β
β
π‘ Form Hypothesis
(event handler leak?)
β
β
β
Validate & Fix
(reproduce, verify fix)
β
β
π Monitor Production
(confirm resolution)
Essential Diagnostic Tools
.NET provides multiple tools, each with specific strengths:
| Tool | Best For | Overhead | Environment |
|---|---|---|---|
| Visual Studio Profiler | Development analysis, detailed object graphs | High | Dev only |
| dotMemory | Deep heap analysis, retention paths | Medium-High | Dev/Staging |
| PerfView | ETW traces, GC events, production diagnosis | Low | All environments |
| Performance Counters | Real-time monitoring, alerting | Very Low | All environments |
| Debug Diagnostic Tool | Memory dump analysis, crash investigation | None (offline) | Production dumps |
π‘ Pro Tip: Start with low-overhead tools in production (Performance Counters, PerfView), then use high-detail tools in development to investigate specific issues.
Performance Counters: Your First Alert System
Performance counters provide real-time metrics without requiring debugger attachment. Key counters for memory diagnostics:
π Critical .NET Memory Counters
| # Bytes in all Heaps | Total managed heap size (watch for steady growth) |
| Gen 2 Collections | Full GC frequency (expensive, should be rare) |
| % Time in GC | GC overhead (>10% indicates problems) |
| Large Object Heap size | LOH growth (>85KB objects, never compacted) |
| Allocated Bytes/sec | Allocation rate (high = pressure on GC) |
π§ Try this: Open Performance Monitor (perfmon.exe), add .NET CLR Memory counters for your process, and watch them during normal operation to establish baselines.
Heap Snapshots: Comparing Memory States
The most powerful technique for finding retention bugs is heap snapshot comparison. This works by:
- Taking snapshot #1 before the suspected leaking operation
- Performing the operation (e.g., opening/closing a form)
- Taking snapshot #2 after the operation
- Comparing snapshots to see what objects remained
β οΈ Critical insight: The leaked objects themselves are often not the problemβit's what's holding onto them that matters. This is where retention paths become essential.
ββββββββββββββββββββββββββββββββββββββββββ
β RETENTION PATH EXAMPLE β
ββββββββββββββββββββββββββββββββββββββββββ
Static Field (GC Root)
β
β
EventManager instance
β
β _subscribers List<>
β
EventHandler delegate
β
β .Target
β
UserControl instance β LEAKED!
β
β
Bitmap (10 MB) β WASTED MEMORY
In this example, the UserControl should have been collected, but an event subscription keeps it alive. The profiler shows you this entire chain, revealing that unsubscribing from the event would fix the leak.
ETW and PerfView: Production-Safe Diagnostics
Event Tracing for Windows (ETW) provides kernel-level instrumentation with minimal overhead. PerfView is Microsoft's free tool for capturing and analyzing ETW traces.
Why PerfView is powerful for production:
- ~2-5% overhead vs. 30-100% for traditional profilers
- No process attachment required until analysis
- Captures GC events showing what triggers collections
- Shows allocation stacks revealing where objects are created
- Thread contention data for performance issues
π§ Memory device: PerfView = Production-safe, Powerful, Precise
Key PerfView commands for memory analysis:
// Collect heap snapshot
PerfView /AcceptEULA collect -GCOnly -AcceptEula
// Collect with allocation stacks (higher overhead)
PerfView collect -GCCollectOnly -MaxCollectSec:300
// Analyze existing trace file
PerfView GCStats myapp.etl.zip
Understanding GC Events in Traces
ETW traces reveal why the GC runs, not just when. The GC Reason field is crucial:
| GC Reason | Meaning | Action |
|---|---|---|
| AllocSmall | Gen 0 budget exhausted (normal) | None needed if infrequent |
| AllocLarge | LOH allocation needed | Reduce >85KB allocations |
| Induced | Code called GC.Collect() | Remove unnecessary calls |
| OutOfMemory | Memory pressure critical | Urgent: investigate retention |
| HighMemory | System memory low | Reduce working set size |
π Pattern to watch: Frequent Gen 2 collections with "OutOfMemory" reason indicates a retention bug, not just high allocation rate.
Detailed Examples
Example 1: Diagnosing an Event Handler Leak
Scenario: A WPF application's memory grows by ~50MB every time a settings dialog opens and closes.
Diagnostic steps:
| Step | Action | Tool | Finding |
|---|---|---|---|
| 1 | Establish baseline memory | Performance Counters | App starts at 120MB |
| 2 | Open/close dialog 5 times | Manual testing | Memory grows to 370MB |
| 3 | Force full GC | GC.Collect(2, GCCollectionMode.Forced) | Memory stays at 350MB |
| 4 | Take heap snapshot | dotMemory | 5 SettingsDialog instances retained |
| 5 | Analyze retention paths | dotMemory Key Retention | ConfigManager event holds references |
| 6 | Inspect event subscription code | Code review | Missing -= on dialog close |
The problematic code:
public class SettingsDialog : Window
{
public SettingsDialog()
{
InitializeComponent();
// Subscribe to static event
ConfigManager.Instance.ConfigChanged += OnConfigChanged;
// β PROBLEM: Never unsubscribes!
// The static ConfigManager holds a reference to this dialog
}
private void OnConfigChanged(object sender, EventArgs e)
{
// Update UI with new config
}
}
The fix:
public class SettingsDialog : Window
{
public SettingsDialog()
{
InitializeComponent();
ConfigManager.Instance.ConfigChanged += OnConfigChanged;
}
protected override void OnClosed(EventArgs e)
{
// β
Unsubscribe to break the reference
ConfigManager.Instance.ConfigChanged -= OnConfigChanged;
base.OnClosed(e);
}
private void OnConfigChanged(object sender, EventArgs e)
{
// Update UI with new config
}
}
π‘ Diagnostic lesson: When memory doesn't decrease after GC, you have a retention bug, not an allocation problem. Heap snapshots reveal the retained objects, and retention paths show why they're retained.
Example 2: Using PerfView for Production Investigation
Scenario: A web API's memory grows steadily in production, but the issue doesn't reproduce in development.
Production-safe diagnostic approach:
## Step 1: Capture baseline ETW trace (15 minutes, low overhead)
PerfView /AcceptEULA /MaxCollectSec:900 /Zip:true collect
## Step 2: Let app run through typical load
## (The /MaxCollectSec:900 automatically stops after 15 minutes)
## Step 3: Download the .etl.zip file from production
## Step 4: Analyze locally in PerfView
In PerfView, key views to examine:
GCStats: Shows GC frequency, heap growth, generation sizes
- Look for: Gen 2 heap size steadily increasing
- Look for: Increasing time in GC
GC Heap Net Mem: Shows which types are accumulating
- Sort by "Diff" column after comparing two snapshots
- Focus on types with large positive diff
GC Heap Alloc Stacks: Shows where objects are allocated
- Double-click on accumulating type
- Walk up call stack to find allocation source
Example findings from PerfView:
βββββββββββββββββββββββββββββββββββββββββββββββ β TOP ACCUMULATING TYPES (over 15 min) β βββββββββββββββββββββββββββββββββββββββββββββββ€ β Type Count Size (MB) β βββββββββββββββββββββββββββββββββββββββββββββββ€ β System.Byte[] 142,853 2,847 β β MyApp.RequestContext 35,671 571 β β System.String 298,442 387 β β MyApp.SessionData 12,234 293 β βββββββββββββββββββββββββββββββββββββββββββββββ
Notice RequestContext and SessionData are both custom types accumulating significantly. The allocation stacks reveal:
MyApp.SessionData allocations:
β
ββ SessionManager.CreateSession
β
ββ SessionCache.Add(sessionId, sessionData)
β
ββ Dictionary<string, SessionData>.Add
Root cause discovered: Sessions were added to the cache but never removed. The cache had no expiration policy!
The fix:
public class SessionCache
{
private readonly ConcurrentDictionary<string, SessionEntry> _cache = new();
// β
Add expiration policy
public void Add(string sessionId, SessionData data)
{
var entry = new SessionEntry
{
Data = data,
ExpiresAt = DateTime.UtcNow.AddMinutes(20)
};
_cache[sessionId] = entry;
// Clean up expired entries periodically
CleanupExpired();
}
private void CleanupExpired()
{
var now = DateTime.UtcNow;
var expiredKeys = _cache
.Where(kvp => kvp.Value.ExpiresAt < now)
.Select(kvp => kvp.Key)
.ToList();
foreach (var key in expiredKeys)
{
_cache.TryRemove(key, out _);
}
}
}
π‘ Diagnostic lesson: Production bugs often stem from load patterns that don't occur in development. PerfView's low overhead makes it safe to run in production, revealing issues that only manifest under real traffic.
Example 3: Analyzing Heap Snapshots with Visual Studio
Scenario: A Windows Service's memory grows during batch processing.
Using Visual Studio Profiler:
- Attach debugger to running service: Debug β Attach to Process
- Take snapshot #1: Debug β Performance Profiler β .NET Object Allocation
- Let batch process run (process 1000 records)
- Take snapshot #2
- Take snapshot #3 after another 1000 records
- Compare snapshots #2 and #3
Visual Studio shows:
ββββββββββββββββββββββββββββββββββββββββββββββ β SNAPSHOT COMPARISON #2 β #3 β ββββββββββββββββββββββββββββββββββββββββββββββ€ β Type Objects Size Diff β ββββββββββββββββββββββββββββββββββββββββββββββ€ β RecordProcessor +1,000 +48 MB β β XmlDocument +1,000 +156 MB β β MemoryStream +2,000 +89 MB β β StringBuilder +5,000 +12 MB β ββββββββββββββββββββββββββββββββββββββββββββββ
The problem: 1,000 RecordProcessor instances are being retained, each holding a large XmlDocument.
Inspecting retention paths:
GC Root: ProcessorRegistry (static field)
β
ββ Dictionary<Guid, RecordProcessor> _activeProcessors
β
ββ [1000 entries, never removed]
β
ββ RecordProcessor instance
β
ββ XmlDocument _recordData (156 KB each)
Root cause: Processors were registered but never unregistered after completion.
The fix:
public class BatchService
{
public async Task ProcessBatchAsync(IEnumerable<Record> records)
{
foreach (var record in records)
{
var processor = new RecordProcessor(record);
var processorId = Guid.NewGuid();
// β OLD: Register and forget
// ProcessorRegistry.Register(processorId, processor);
// β
NEW: Register, process, unregister
ProcessorRegistry.Register(processorId, processor);
try
{
await processor.ProcessAsync();
}
finally
{
ProcessorRegistry.Unregister(processorId);
}
}
}
}
Even better, use a pattern that doesn't require manual cleanup:
public class BatchService
{
public async Task ProcessBatchAsync(IEnumerable<Record> records)
{
// β
BEST: Use temporary collection, no global registry
var processors = records
.Select(r => new RecordProcessor(r))
.ToList();
await Task.WhenAll(processors.Select(p => p.ProcessAsync()));
// All processors eligible for GC when method completes
}
}
π‘ Diagnostic lesson: Snapshot comparison reveals what's accumulating. Retention path analysis reveals why. Together, they point directly to the leak source.
Example 4: Identifying LOH Fragmentation
Scenario: Application experiences occasional OutOfMemory exceptions despite total memory usage being reasonable.
The counter-intuitive symptom: Performance Monitor shows:
- Total heap size: 850 MB (plenty of RAM available)
- Application crashes with OutOfMemoryException
- Gen 2 collections are frequent
Diagnostic approach:
// Check LOH fragmentation programmatically
var gcMemoryInfo = GC.GetGCMemoryInfo();
var lohSize = gcMemoryInfo.GenerationInfo[3].SizeAfterBytes; // Gen 3 = LOH
var lohFragmented = gcMemoryInfo.FragmentedBytes;
Console.WriteLine($"LOH Size: {lohSize / 1024 / 1024} MB");
Console.WriteLine($"LOH Fragmented: {lohFragmented / 1024 / 1024} MB");
Console.WriteLine($"Fragmentation %: {(double)lohFragmented / lohSize * 100:F1}%");
Output reveals the problem:
LOH Size: 645 MB
LOH Fragmented: 412 MB
Fragmentation %: 63.9%
What's happening: The Large Object Heap (objects >85KB) never gets compacted. When your code allocates and releases large objects in varying sizes, you create "holes" in the LOH. Eventually, even though there's free memory, there's no contiguous space for a large allocation.
LOH FRAGMENTATION VISUALIZATION ββββββββββββββββββββββββββββββββββββββββββ β Initial state (3 large objects) β ββββββββββββββββββββββββββββββββββββββββββ€ β [ Object A ][ Object B ][ Obj C ]β β 200KB 150KB 100KB β ββββββββββββββββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββββββββββ β Object B released (hole created) β ββββββββββββββββββββββββββββββββββββββββββ€ β [ Object A ][ FREE ][ Obj C ] β β 200KB 150KB 100KB β ββββββββββββββββββββββββββββββββββββββββββ ββββββββββββββββββββββββββββββββββββββββββ β Try to allocate 180KB object β FAILS! β ββββββββββββββββββββββββββββββββββββββββββ€ β [ Object A ][ FREE ][ Obj C ] β β 200KB β 150KB β 100KB β β ββ Too small! ββ β β OutOfMemoryException despite free spaceβ ββββββββββββββββββββββββββββββββββββββββββ
Using PerfView to diagnose:
- Open the
.etltrace in PerfView - Navigate to GC Heap Net Mem (Coarse) Stacks
- Filter for size >85,000 bytes
- Identify which allocations are large
Common LOH allocation sources:
| Type | Typical Size | Solution |
|---|---|---|
byte[] for buffers |
1MB+ | Use ArrayPool |
string concatenation |
85KB+ | StringBuilder or string.Create |
Bitmap images |
Varies | Dispose promptly, reuse when possible |
List capacity |
When T is large | Pre-size or use LinkedList |
The fix for our scenario (large byte[] buffers):
// β OLD: Creates new 1MB array each time (goes to LOH)
public byte[] ProcessData(Stream input)
{
var buffer = new byte[1024 * 1024]; // 1 MB buffer
input.Read(buffer, 0, buffer.Length);
return TransformData(buffer);
}
// β
NEW: Rent from pool, return when done
public byte[] ProcessData(Stream input)
{
var buffer = ArrayPool<byte>.Shared.Rent(1024 * 1024);
try
{
input.Read(buffer, 0, 1024 * 1024);
return TransformData(buffer);
}
finally
{
ArrayPool<byte>.Shared.Return(buffer);
}
}
π‘ Diagnostic lesson: Not all memory problems are leaks. LOH fragmentation causes OOM even with available memory. Look for large allocations and eliminate them through pooling or size reduction.
Common Mistakes in Memory Diagnostics
β οΈ Mistake #1: Only checking total memory usage
Why it's wrong: Total memory tells you if there's a problem, not what the problem is.
What to do instead: Track rate of change and GC behavior. A steady 500MB might be fine, but growing 10MB/hour indicates a leak.
β οΈ Mistake #2: Calling GC.Collect() to "fix" memory issues
Why it's wrong: Forcing GC masks symptoms without addressing root causes. If memory returns after GC, greatβyou had an allocation rate problem. If it doesn't return, you have a retention bug that needs fixing.
What to do instead: Use GC.Collect() only as a diagnostic tool to distinguish allocation from retention:
// Diagnostic pattern
var beforeGC = GC.GetTotalMemory(false);
GC.Collect(2, GCCollectionMode.Forced);
GC.WaitForPendingFinalizers();
var afterGC = GC.GetTotalMemory(true);
if ((beforeGC - afterGC) < beforeGC * 0.1)
{
Console.WriteLine("β οΈ Retention bug: GC only freed 10%");
}
β οΈ Mistake #3: Ignoring finalization queue depth
Why it's wrong: Objects with finalizers (destructors in C#) go through a two-stage collection process. A backlog in the finalization queue appears as a memory leak.
What to do instead: Check the finalization queue:
var gen0 = GC.CollectionCount(0);
var gen1 = GC.CollectionCount(1);
var gen2 = GC.CollectionCount(2);
// If Gen 2 collections are frequent but memory isn't freed,
// check for objects waiting for finalization
Use PerfView's GC/Finalizer Queue Length graph to spot finalization backlogs.
Common causes:
- Finalizers that block or take too long
- Too many finalizable objects being created
- Deadlock in finalizer code
β οΈ Mistake #4: Taking snapshots too frequently
Why it's wrong: Heap snapshots pause the application and consume significant memory themselves. Taking snapshots every minute creates performance problems.
What to do instead: Use strategic snapshot timing:
- Baseline (app idle)
- After suspected leak operation
- After second occurrence (to confirm pattern)
- Final snapshot for comparison
β οΈ Mistake #5: Focusing on small objects instead of retention paths
Why it's wrong: Finding that you have "10,000 string objects" doesn't tell you why they're retained or which ones are problematic.
What to do instead:
- Sort by total size, not count
- Focus on your types, not framework types
- Examine retention paths for your types
- Find the GC root (static field, active thread) holding the chain
β οΈ Mistake #6: Not establishing baselines
Why it's wrong: Without knowing normal behavior, you can't identify abnormal behavior. Is 400MB memory usage good or bad? Depends on your baseline.
What to do instead: Record baseline metrics:
- Memory at application start
- Memory after typical operations
- Typical GC collection frequency
- Expected working set size
Use these to identify deviations that indicate problems.
Key Takeaways
π― Core Diagnostic Principles:
Symptoms vs. root causes: High memory is a symptom. The root cause is what's retaining objects unnecessarily.
Systematic workflow: Observe β Measure β Compare β Analyze β Hypothesize β Validate β Fix
Tool selection matters: Use low-overhead tools (PerfView, perfmon) in production, high-detail tools (dotMemory, VS Profiler) in development.
Retention paths reveal everything: Finding what objects are retained is less important than finding what's holding them.
GC behavior tells a story: Frequent Gen 2 collections, high GC time %, and increasing heap size are red flags.
LOH is special: Large objects (>85KB) never get compacted, leading to fragmentation. Pool large buffers.
Event handlers are common culprits: Subscriptions to static events or long-lived objects create retention chains.
π Diagnostic Checklist:
- Establish baseline metrics (memory, GC frequency)
- Monitor performance counters continuously
- Take strategic heap snapshots (before/after)
- Compare snapshots to find accumulating types
- Analyze retention paths to GC roots
- Verify hypothesis by reproducing the pattern
- Implement fix and confirm resolution
- Monitor production to validate
π‘ Remember: The best diagnostic tool is systematic thinking. Tools provide data; your analysis provides solutions.
π Quick Reference Card: Memory Diagnostic Commands
| Check total memory | GC.GetTotalMemory(false) |
| Check GC collections | GC.CollectionCount(0/1/2) |
| Get GC info | GC.GetGCMemoryInfo() |
| Force GC (diagnostic) | GC.Collect(2, GCCollectionMode.Forced) |
| PerfView capture | PerfView collect -GCOnly |
| Rent from pool | ArrayPool<T>.Shared.Rent(size) |
| Return to pool | ArrayPool<T>.Shared.Return(array) |
π Further Study
Microsoft Docs - Memory Performance Best Practices: https://learn.microsoft.com/en-us/dotnet/standard/garbage-collection/memory-management-and-gc
PerfView Tutorial and Documentation: https://github.com/microsoft/perfview/blob/main/documentation/Tutorial.md
JetBrains dotMemory Profiling Guide: https://www.jetbrains.com/help/dotmemory/Profiling_Guidelines.html
π Next Steps: Now that you've mastered diagnostic techniques, practice applying them to real scenarios. Set up monitoring in your applications, establish baselines, and proactively look for memory patterns before they become production issues. The skills you've learned here transform reactive firefighting into proactive engineering excellence.