Normalization and Cache Mechanics
Understanding Relay's store structure and update propagation
Normalization and Cache Mechanics in Relay
Master Relay's powerful normalization system with free flashcards and hands-on examples. This lesson covers how Relay normalizes GraphQL responses, manages cache consistency through unique identifiers, and optimizes data fetching with automatic request deduplicationβessential concepts for building performant React applications with GraphQL.
Welcome to Cache Truth πΎ
Welcome to one of the most transformative concepts in modern client-side data management! If you've ever struggled with keeping data synchronized across different components, wondered why some GraphQL clients feel "magical," or fought with stale data bugs, you're about to discover the elegant solution Relay provides.
Relay's normalization system transforms how your application thinks about data. Instead of storing responses as nested trees scattered throughout your app, Relay maintains a single, normalized cache where each record lives exactly once. This architectural decision eliminates data duplication, prevents inconsistencies, and makes updates propagate automatically across your entire UI.
Think of it like the difference between having multiple copies of a document scattered across your desk versus having one master document that everyone references. When you update the master, everyone instantly sees the changesβno manual synchronization required.
Core Concepts: The Foundation of Relay's Cache ποΈ
What Is Normalization? π
Normalization is the process of flattening nested GraphQL responses into a flat map of records, each identified by a unique key. Instead of storing data as it arrives from the server (nested objects within objects), Relay breaks it down into individual entities.
Consider this GraphQL response:
{
user(id: "123") {
id
name
posts {
id
title
author {
id
name
}
}
}
}
Without normalization, this nested structure would be stored exactly as-is. But what happens when another query fetches the same user? You'd have duplicate copies of user data, and updating one wouldn't affect the other.
Relay normalizes this into:
| Record ID | Type | Fields |
|---|---|---|
| User:123 | User | {id: "123", name: "Alice", posts: [Post:456, Post:789]} |
| Post:456 | Post | {id: "456", title: "GraphQL Basics", author: User:123} |
| Post:789 | Post | {id: "789", title: "Relay Guide", author: User:123} |
Notice how nested objects become references to other records. Each entity exists exactly once, identified by its type and ID.
The Global Object Identification Specification π
Global IDs are the foundation of Relay's normalization. Every object that can be refetched or updated needs a globally unique identifier. Relay follows a specification:
- Every type must have an
idfield (or implement theNodeinterface) - IDs must be globally unique across your entire schema
- IDs must be opaque (clients shouldn't parse them)
Typically, global IDs are base64-encoded strings like "VXNlcjoxMjM=" which encodes both the type (User) and the database ID (123).
π‘ Pro Tip: Use a consistent ID generation strategy across your backend. Many teams use base64(TypeName:DatabaseID) for predictability during debugging.
RELAY CACHE STRUCTURE
βββββββββββββββββββββββββββββββββββββββββββ
β RELAY STORE (Flat Map) β
βββββββββββββββββββββββββββββββββββββββββββ€
β β
β "User:123" β {id, name, posts: [...]} β
β β β
β βββ References Post:456, Post:789β
β β
β "Post:456" β {id, title, author: ...} β
β β β
β βββ References User:123 β
β β
β "Post:789" β {id, title, author: ...} β
β β β
β βββ References User:123 β
β β
βββββββββββββββββββββββββββββββββββββββββββ
β
Single source of truth
β
No data duplication
β
Automatic consistency
Cache Keys and Record Identification π
Relay generates cache keys using a data ID function. The default implementation:
function defaultDataIdFromObject(object, typeName) {
if (object.id != null) {
return `${typeName}:${object.id}`;
}
return null; // No stable ID available
}
You can customize this behavior in your Relay environment:
const environment = new Environment({
store: new Store(new RecordSource(), {
gcReleaseBufferSize: 10,
}),
network,
getDataID(fieldValue, typeName) {
// Custom logic for generating IDs
if (typeName === 'User' && fieldValue.username) {
return `User:${fieldValue.username}`;
}
return defaultGetDataID(fieldValue, typeName);
},
});
β οΈ Common Mistake: Not all types need IDs! Connection edges, paginated results, and ephemeral objects can use client-generated IDs or remain unnormalized.
How Normalization Works: Step-by-Step π
Let's walk through how Relay processes a GraphQL response:
Step 1: Response Arrives
{
"data": {
"viewer": {
"id": "User:1",
"name": "Bob",
"friends": [
{"id": "User:2", "name": "Carol"},
{"id": "User:3", "name": "Dave"}
]
}
}
}
Step 2: Relay Extracts Records
| Record | Data |
|---|---|
| User:1 | {id: "User:1", name: "Bob", friends: [User:2, User:3]} |
| User:2 | {id: "User:2", name: "Carol"} |
| User:3 | {id: "User:3", name: "Dave"} |
Step 3: Relay Updates the Store
Each record is merged into the existing cache. If User:2 was already in the cache with additional fields, Relay performs a shallow merge, preserving data from both old and new versions.
Step 4: Components Re-render
Any component reading data from these records automatically receives updates. If a component displays User:2's name, it re-renders when the name changesβregardless of which query triggered the update.
NORMALIZATION PIPELINE
π‘ GraphQL Response
|
β
βββββββββββββββββββββ
β Response Parser β β Traverses nested data
ββββββββββ¬βββββββββββ
β
βββββββββββββββββββββ
β Extract Records β β Identifies objects with IDs
ββββββββββ¬βββββββββββ
β
βββββββββββββββββββββ
β Generate Keys β β Creates "Type:ID" keys
ββββββββββ¬βββββββββββ
β
βββββββββββββββββββββ
β Merge into Store β β Updates existing records
ββββββββββ¬βββββββββββ
β
βββββββββββββββββββββ
β Notify Subscribersβ β Components re-render
βββββββββββββββββββββ
Cache Consistency and Updates π
Cache consistency means that if the same entity appears in multiple places in your UI, they all display the same data. Relay achieves this through normalizationβsince each entity exists only once, updating it automatically affects all references.
Three types of cache updates:
- Automatic Updates: When a mutation returns an object with an ID, Relay automatically merges it
- Declarative Updates: Using
updaterfunctions to manually modify the cache - Optimistic Updates: Applying changes immediately before the server responds
const [commit] = useMutation(graphql`
mutation UpdateUserMutation($input: UpdateUserInput!) {
updateUser(input: $input) {
user {
id
name # Relay auto-updates User:{id} record
}
}
}
`);
Because the mutation returns user.id, Relay knows exactly which cache record to update. No manual cache manipulation needed!
π‘ Did You Know? Relay's consistency model is inspired by database normalization (Third Normal Form). Just as databases avoid redundancy, Relay eliminates duplicate data in memory.
Advanced Cache Mechanics π§
Request Deduplication and Batching β‘
Relay automatically deduplicates identical in-flight requests. If three components mount simultaneously and request the same query, Relay sends one network request and shares the result.
REQUEST DEDUPLICATION
Component A ββ
βββ Query UserQuery(id: "123")
Component B ββ€ |
β β
Component C ββ ββββββββββββββββ
β Relay Network β β Only ONE request sent
ββββββββ¬ββββββββ
β
GraphQL Server
|
βββββββ΄ββββββ
Component A ββββββ€ β
Component B ββββββ€ Response β β Shared with all
Component C ββββββ€ β
βββββββββββββ
Batching combines multiple queries into a single HTTP request (requires server support):
const network = Network.create((operation, variables) => {
return fetch('/graphql', {
method: 'POST',
headers: {'Content-Type': 'application/json'},
body: JSON.stringify({
query: operation.text,
variables,
}),
}).then(response => response.json());
});
For batching, use relay-runtime's Network.batch:
const network = Network.batch(batchFunction, {
batchTimeout: 10, // Wait 10ms to collect requests
maxBatchSize: 5, // Max 5 queries per batch
});
Garbage Collection and Cache Retention ποΈ
Relay's garbage collector removes records no longer referenced by any active query. This prevents memory leaks in long-running applications.
Retention strategies:
| Strategy | Description | Use Case |
|---|---|---|
| Query-based | Keep data while query is active | Most components |
| Manual retention | Explicitly retain records | Global app state |
| LRU eviction | Remove least recently used | Memory-constrained apps |
Configuration example:
const store = new Store(new RecordSource(), {
gcReleaseBufferSize: 10, // Keep 10 queries worth of data after release
queryCacheExpirationTime: 5 * 60 * 1000, // 5 minutes
});
Manual retention:
const {environment} = useRelayEnvironment();
const disposable = environment.retain({
dataID: 'User:123',
node: UserFragment,
variables: {},
});
// Later, allow GC
disposable.dispose();
Cache Invalidation Strategies π¨
Knowing when to invalidate cached data is crucial:
1. Time-based invalidation:
const {data, refetch} = useQuery(MyQuery);
useEffect(() => {
const interval = setInterval(() => {
refetch({}, {fetchPolicy: 'network-only'});
}, 30000); // Refresh every 30s
return () => clearInterval(interval);
}, [refetch]);
2. Event-based invalidation:
// After a mutation, invalidate related data
commitMutation(environment, {
mutation: DeletePostMutation,
variables: {postId},
updater: (store) => {
const user = store.get('User:123');
const posts = user.getLinkedRecords('posts');
const filtered = posts.filter(p => p.getDataID() !== `Post:${postId}`);
user.setLinkedRecords(filtered, 'posts');
},
});
3. Server-driven invalidation:
type Mutation {
updatePost(input: UpdatePostInput!): UpdatePostPayload
}
type UpdatePostPayload {
post: Post
invalidate: [ID!] # List of IDs to refetch
}
Practical Examples πΌ
Example 1: Basic Normalization in Action π―
Scenario: A social media feed showing posts from multiple users. Some users appear multiple times.
GraphQL Query:
query FeedQuery {
feed {
posts {
id
content
author {
id
name
avatar
}
comments {
id
text
author {
id
name
}
}
}
}
}
Without Normalization:
- User "Alice" appears 15 times (author of 10 posts, commenter on 5)
- Updating Alice's name requires 15 separate updates
- Inconsistent UI if updates miss some copies
- Memory waste: 15 copies of same user data
With Relay Normalization:
CACHE STRUCTURE:
ββββββββββββββββββββββββββββββββββββββββββββ
β User:1 β {id: "1", name: "Alice", ...} β β Single record
ββββββββββββββββββββββββββββββββββββββββββββ€
β Post:101 β {author: β User:1, ...} β β Reference
β Post:102 β {author: β User:1, ...} β β Reference
β Comment:501 β {author: β User:1, ...} β β Reference
β Comment:502 β {author: β User:1, ...} β β Reference
ββββββββββββββββββββββββββββββββββββββββββββ
β
Alice exists ONCE
β
All references point to same record
β
One update propagates everywhere
Code Implementation:
import {graphql, useFragment} from 'react-relay';
function PostItem({postRef}) {
const post = useFragment(
graphql`
fragment PostItem_post on Post {
id
content
author {
id
name
avatar
}
}
`,
postRef
);
return (
<div>
<img src={post.author.avatar} />
<strong>{post.author.name}</strong>
<p>{post.content}</p>
</div>
);
}
When Alice updates her name via a mutation, every PostItem component automatically re-renders with the new nameβno props drilling, no manual state management.
Example 2: Handling Mutations with Automatic Updates π
Scenario: User edits their profile. The update should reflect everywhere their profile appears.
Mutation:
mutation UpdateProfileMutation($input: UpdateProfileInput!) {
updateProfile(input: $input) {
user {
id # Critical: Relay uses this to find cache record
name
bio
avatar
}
}
}
React Component:
import {useMutation, graphql} from 'react-relay';
function EditProfile() {
const [commit, isInFlight] = useMutation(graphql`
mutation EditProfileMutation($input: UpdateProfileInput!) {
updateProfile(input: $input) {
user {
id
name
bio
avatar
}
}
}
`);
const handleSave = (newData) => {
commit({
variables: {input: newData},
// Optimistic update: instant UI feedback
optimisticResponse: {
updateProfile: {
user: {
id: 'User:123',
name: newData.name,
bio: newData.bio,
avatar: newData.avatar,
},
},
},
});
};
return <form onSubmit={handleSave}>...</form>;
}
What Happens:
| Step | Action | Cache State |
|---|---|---|
| 1 | User clicks Save | Original data |
| 2 | Optimistic update applies | UI shows new data instantly |
| 3 | Network request in flight | Still showing optimistic data |
| 4 | Server responds | Replaced with server data |
| 5 | All components re-render | Consistent everywhere |
Relay automatically:
- Finds the
User:123record in cache - Merges new fields (
name,bio,avatar) - Notifies all components reading
User:123 - Triggers re-renders with updated data
π‘ Pro Tip: Always return the full updated object from mutations, including the id. This enables automatic cache updates without manual updater functions.
Example 3: Complex Cache Update with Updater Functions π οΈ
Scenario: Adding a new comment to a post. The server returns the comment, but we need to append it to the post's comments connection.
Challenge: The mutation returns a Comment, but we need to update the parent Post's list.
Mutation with Updater:
import {ConnectionHandler} from 'relay-runtime';
const [commit] = useMutation(graphql`
mutation AddCommentMutation($input: AddCommentInput!) {
addComment(input: $input) {
comment {
id
text
createdAt
author {
id
name
}
}
}
}
`);
const handleSubmit = (text) => {
commit({
variables: {
input: {
postId: 'Post:456',
text,
},
},
updater: (store) => {
// Get the newly created comment from response
const payload = store.getRootField('addComment');
const newComment = payload.getLinkedRecord('comment');
// Get the post record
const post = store.get('Post:456');
// Get the comments connection
const connection = ConnectionHandler.getConnection(
post,
'PostComments_comments' // Connection key from fragment
);
// Create a new edge
const edge = ConnectionHandler.createEdge(
store,
connection,
newComment,
'CommentEdge'
);
// Insert at the beginning
ConnectionHandler.insertEdgeBefore(connection, edge);
},
});
};
Step-by-Step Breakdown:
| Step | Store Method | Purpose |
|---|---|---|
| 1 | store.getRootField('addComment') | Access mutation response |
| 2 | payload.getLinkedRecord('comment') | Extract new comment record |
| 3 | store.get('Post:456') | Find parent post in cache |
| 4 | ConnectionHandler.getConnection(...) | Get paginated comments list |
| 5 | ConnectionHandler.createEdge(...) | Wrap comment in edge node |
| 6 | ConnectionHandler.insertEdgeBefore(...) | Add to connection start |
Visual Representation:
BEFORE MUTATION:
Post:456
ββ comments (connection)
ββ Comment:1
ββ Comment:2
ββ Comment:3
AFTER MUTATION:
Post:456
ββ comments (connection)
ββ Comment:999 β NEW! Added by updater
ββ Comment:1
ββ Comment:2
ββ Comment:3
This manual update is necessary because Relay can't infer where in the list to add the new comment. The updater function provides explicit instructions.
Example 4: Custom Cache Keys for Non-Standard IDs π
Scenario: Legacy API uses username as identifier instead of numeric id.
GraphQL Type:
type User {
username: String! # Primary key, not "id"
displayName: String
email: String
}
Problem: Relay's default ID extractor looks for id field, finds nothing, and fails to normalize.
Solution: Custom getDataID function.
import {Environment, Network, RecordSource, Store} from 'relay-runtime';
function customGetDataID(fieldValue, typeName) {
// Handle User type specially
if (typeName === 'User' && fieldValue.username != null) {
return `User:${fieldValue.username}`;
}
// Handle LegacyPost with composite key
if (typeName === 'LegacyPost' && fieldValue.userId && fieldValue.timestamp) {
return `LegacyPost:${fieldValue.userId}:${fieldValue.timestamp}`;
}
// Fall back to default behavior for types with "id"
if (fieldValue.id != null) {
return `${typeName}:${fieldValue.id}`;
}
// No stable ID available
return null;
}
const environment = new Environment({
store: new Store(new RecordSource(), {gcReleaseBufferSize: 10}),
network: Network.create(fetchQuery),
getDataID: customGetDataID,
});
Result:
| Type | Fields | Generated Cache Key |
|---|---|---|
| User | {username: "alice"} | User:alice |
| User | {username: "bob"} | User:bob |
| LegacyPost | {userId: "alice", timestamp: 1234} | LegacyPost:alice:1234 |
| Comment | {id: "999"} | Comment:999 |
Now Relay can normalize User records even without an id field!
β οΈ Warning: Custom keys must be stable and globally unique. If a username can change, it's not suitable as a cache key.
Common Mistakes to Avoid β οΈ
Mistake 1: Forgetting to Return id in Mutations π«
Problem:
mutation UpdateUserMutation($input: UpdateUserInput!) {
updateUser(input: $input) {
user {
# Missing "id" field!
name
email
}
}
}
Consequence: Relay can't identify which cache record to update. The mutation succeeds on the server, but your UI doesn't update.
Fix:
mutation UpdateUserMutation($input: UpdateUserInput!) {
updateUser(input: $input) {
user {
id # β
Always include id!
name
email
}
}
}
Mistake 2: Modifying Cache Records Directly π«
Problem:
// β WRONG: Direct mutation
const user = store.get('User:123');
user.name = 'New Name'; // This doesn't work!
Why It Fails: Relay's store is immutable. Direct property assignment has no effect.
Fix:
// β
RIGHT: Use store methods
const user = store.get('User:123');
user.setValue('New Name', 'name');
Mistake 3: Not Handling Connections Properly π«
Problem:
// β WRONG: Treating connection as array
const post = store.get('Post:456');
const comments = post.getLinkedRecords('comments');
comments.push(newComment); // Breaks pagination!
Why It Fails: Relay connections are special structures with edges, cursors, and page info. Direct array manipulation breaks pagination.
Fix:
// β
RIGHT: Use ConnectionHandler
const connection = ConnectionHandler.getConnection(
post,
'PostComments_comments'
);
const edge = ConnectionHandler.createEdge(
store,
connection,
newComment,
'CommentEdge'
);
ConnectionHandler.insertEdgeAfter(connection, edge);
Mistake 4: Ignoring Garbage Collection π«
Problem: Keeping every query result in memory forever, causing memory leaks in long-running apps.
Consequence: Browser tab crashes after hours of use, especially on memory-constrained devices.
Fix:
// Configure appropriate GC settings
const store = new Store(new RecordSource(), {
gcReleaseBufferSize: 10, // Adjust based on app needs
queryCacheExpirationTime: 5 * 60 * 1000, // 5 minutes
});
// Manually dispose of long-running queries
const disposable = environment.retain(operation);
// Later:
disposable.dispose();
Mistake 5: Assuming Immediate Consistency Across Tabs π«
Problem: Expecting cache updates in one browser tab to instantly reflect in another tab.
Reality: Each tab has its own Relay environment and cache. They don't communicate by default.
Solutions:
- BroadcastChannel API to sync updates across tabs
- WebSocket subscriptions to receive server-pushed updates
- Periodic polling with
refetch()
// Example: Cross-tab synchronization
const channel = new BroadcastChannel('relay-sync');
commitMutation(environment, {
mutation: UpdateUserMutation,
onCompleted: (response) => {
// Notify other tabs
channel.postMessage({
type: 'CACHE_UPDATE',
recordID: response.updateUser.user.id,
data: response.updateUser.user,
});
},
});
channel.onmessage = (event) => {
if (event.data.type === 'CACHE_UPDATE') {
// Update this tab's cache
environment.commitUpdate((store) => {
const record = store.get(event.data.recordID);
Object.entries(event.data.data).forEach(([key, value]) => {
record.setValue(value, key);
});
});
}
};
Key Takeaways π―
π Quick Reference Card
| Normalization | Flattens nested responses into a map of records identified by unique keys |
| Global ID | Unique identifier (typically TypeName:DatabaseID) for cache keys |
| Cache Consistency | Single source of truth ensures all UI references display the same data |
| Automatic Updates | Mutations returning id field trigger automatic cache merges |
| Updater Functions | Manual cache manipulation for complex scenarios (lists, connections) |
| Deduplication | Identical in-flight queries share a single network request |
| Garbage Collection | Removes unreferenced records to prevent memory leaks |
| Optimistic Updates | Apply changes instantly before server confirmation for better UX |
Core Principles to Remember:
- Every object that can be refetched needs a global ID β Implement the
Nodeinterface or ensureidfields exist - The cache is immutable β Use store methods (
setValue,setLinkedRecord) instead of direct assignment - Normalization eliminates duplication β Each entity exists once; all references point to it
- Always return
idin mutations β Enables automatic cache updates without manual work - Use ConnectionHandler for lists β Respect pagination structures; don't treat them as plain arrays
- Configure garbage collection β Balance memory usage with data availability
- Leverage optimistic updates β Provide instant feedback while waiting for server responses
Performance Benefits:
PERFORMANCE GAINS FROM NORMALIZATION
Without With
Normalization Normalization
βββββββββββββ βββββββββββββ
Memory Usage βββββββββββ βββ (70% reduction)
Cache Lookups ββββββββ ββ (75% faster)
Update Cost βββββββββββ β (91% faster)
Consistency β Manual β
Automatic
When to Use Custom Strategies:
- Custom getDataID: Legacy APIs without standard
idfields - Manual updaters: Adding/removing items from lists or connections
- Optimistic updates: Actions requiring instant visual feedback
- Manual retention: Global data that should survive query unmounting
π Further Study
Official Documentation:
Advanced Topics: