Normalization and Cache Mechanics

Understanding Relay's store structure and update propagation

Normalization and Cache Mechanics in Relay

Master Relay's powerful normalization system with free flashcards and hands-on examples. This lesson covers how Relay normalizes GraphQL responses, manages cache consistency through unique identifiers, and optimizes data fetching with automatic request deduplication—essential concepts for building performant React applications with GraphQL.

Welcome to Cache Truth 💾

Welcome to one of the most transformative concepts in modern client-side data management! If you've ever struggled with keeping data synchronized across different components, wondered why some GraphQL clients feel "magical," or fought with stale data bugs, you're about to discover the elegant solution Relay provides.

Relay's normalization system transforms how your application thinks about data. Instead of storing responses as nested trees scattered throughout your app, Relay maintains a single, normalized cache where each record lives exactly once. This architectural decision eliminates data duplication, prevents inconsistencies, and makes updates propagate automatically across your entire UI.

Think of it like the difference between having multiple copies of a document scattered across your desk versus having one master document that everyone references. When you update the master, everyone instantly sees the changes—no manual synchronization required.

Core Concepts: The Foundation of Relay's Cache 🏗️

What Is Normalization? 🔍

Normalization is the process of flattening nested GraphQL responses into a flat map of records, each identified by a unique key. Instead of storing data as it arrives from the server (nested objects within objects), Relay breaks it down into individual entities.

Consider this GraphQL response:

{
  user(id: "123") {
    id
    name
    posts {
      id
      title
      author {
        id
        name
      }
    }
  }
}

Without normalization, this nested structure would be stored exactly as-is. But what happens when another query fetches the same user? You'd have duplicate copies of user data, and updating one wouldn't affect the other.

Relay normalizes this into:

Record ID	Type	Fields
User:123	User	{id: "123", name: "Alice", posts: [Post:456, Post:789]}
Post:456	Post	{id: "456", title: "GraphQL Basics", author: User:123}
Post:789	Post	{id: "789", title: "Relay Guide", author: User:123}

Notice how nested objects become references to other records. Each entity exists exactly once, identified by its type and ID.

The Global Object Identification Specification 🌍

Global IDs are the foundation of Relay's normalization. Every object that can be refetched or updated needs a globally unique identifier. Relay follows a specification:

Every type must have an id field (or implement the Node interface)
IDs must be globally unique across your entire schema
IDs must be opaque (clients shouldn't parse them)

Typically, global IDs are base64-encoded strings like "VXNlcjoxMjM=" which encodes both the type (User) and the database ID (123).

💡 Pro Tip: Use a consistent ID generation strategy across your backend. Many teams use base64(TypeName:DatabaseID) for predictability during debugging.

RELAY CACHE STRUCTURE

┌─────────────────────────────────────────┐
│         RELAY STORE (Flat Map)          │
├─────────────────────────────────────────┤
│                                         │
│  "User:123" → {id, name, posts: [...]} │
│       │                                 │
│       └─→ References Post:456, Post:789│
│                                         │
│  "Post:456" → {id, title, author: ...} │
│       │                                 │
│       └─→ References User:123          │
│                                         │
│  "Post:789" → {id, title, author: ...} │
│       │                                 │
│       └─→ References User:123          │
│                                         │
└─────────────────────────────────────────┘

     ✅ Single source of truth
     ✅ No data duplication
     ✅ Automatic consistency

Cache Keys and Record Identification 🔑

Relay generates cache keys using a data ID function. The default implementation:

function defaultDataIdFromObject(object, typeName) {
  if (object.id != null) {
    return `${typeName}:${object.id}`;
  }
  return null; // No stable ID available
}

You can customize this behavior in your Relay environment:

const environment = new Environment({
  store: new Store(new RecordSource(), {
    gcReleaseBufferSize: 10,
  }),
  network,
  getDataID(fieldValue, typeName) {
    // Custom logic for generating IDs
    if (typeName === 'User' && fieldValue.username) {
      return `User:${fieldValue.username}`;
    }
    return defaultGetDataID(fieldValue, typeName);
  },
});

⚠️ Common Mistake: Not all types need IDs! Connection edges, paginated results, and ephemeral objects can use client-generated IDs or remain unnormalized.

How Normalization Works: Step-by-Step 📋

Let's walk through how Relay processes a GraphQL response:

Step 1: Response Arrives

{
  "data": {
    "viewer": {
      "id": "User:1",
      "name": "Bob",
      "friends": [
        {"id": "User:2", "name": "Carol"},
        {"id": "User:3", "name": "Dave"}
      ]
    }
  }
}

Step 2: Relay Extracts Records

Record	Data
User:1	{id: "User:1", name: "Bob", friends: [User:2, User:3]}
User:2	{id: "User:2", name: "Carol"}
User:3	{id: "User:3", name: "Dave"}

Step 3: Relay Updates the Store

Each record is merged into the existing cache. If User:2 was already in the cache with additional fields, Relay performs a shallow merge, preserving data from both old and new versions.

Step 4: Components Re-render

Any component reading data from these records automatically receives updates. If a component displays User:2's name, it re-renders when the name changes—regardless of which query triggered the update.

NORMALIZATION PIPELINE

📡 GraphQL Response
        |
        ↓
┌───────────────────┐
│  Response Parser  │ ← Traverses nested data
└────────┬──────────┘
         ↓
┌───────────────────┐
│ Extract Records   │ ← Identifies objects with IDs
└────────┬──────────┘
         ↓
┌───────────────────┐
│ Generate Keys     │ ← Creates "Type:ID" keys
└────────┬──────────┘
         ↓
┌───────────────────┐
│ Merge into Store  │ ← Updates existing records
└────────┬──────────┘
         ↓
┌───────────────────┐
│ Notify Subscribers│ ← Components re-render
└───────────────────┘

Cache Consistency and Updates 🔄

Cache consistency means that if the same entity appears in multiple places in your UI, they all display the same data. Relay achieves this through normalization—since each entity exists only once, updating it automatically affects all references.

Three types of cache updates:

Automatic Updates: When a mutation returns an object with an ID, Relay automatically merges it
Declarative Updates: Using updater functions to manually modify the cache
Optimistic Updates: Applying changes immediately before the server responds

const [commit] = useMutation(graphql`
  mutation UpdateUserMutation($input: UpdateUserInput!) {
    updateUser(input: $input) {
      user {
        id
        name  # Relay auto-updates User:{id} record
      }
    }
  }
`);

Because the mutation returns user.id, Relay knows exactly which cache record to update. No manual cache manipulation needed!

💡 Did You Know? Relay's consistency model is inspired by database normalization (Third Normal Form). Just as databases avoid redundancy, Relay eliminates duplicate data in memory.

Advanced Cache Mechanics 🔧

Request Deduplication and Batching ⚡

Relay automatically deduplicates identical in-flight requests. If three components mount simultaneously and request the same query, Relay sends one network request and shares the result.

REQUEST DEDUPLICATION

  Component A ─┐
               ├─→ Query UserQuery(id: "123")
  Component B ─┤         |
               │         ↓
  Component C ─┘   ┌──────────────┐
                   │ Relay Network │ ← Only ONE request sent
                   └──────┬───────┘
                          ↓
                    GraphQL Server
                          |
                    ┌─────┴─────┐
  Component A ←────┤           │
  Component B ←────┤  Response │ ← Shared with all
  Component C ←────┤           │
                   └───────────┘

Batching combines multiple queries into a single HTTP request (requires server support):

const network = Network.create((operation, variables) => {
  return fetch('/graphql', {
    method: 'POST',
    headers: {'Content-Type': 'application/json'},
    body: JSON.stringify({
      query: operation.text,
      variables,
    }),
  }).then(response => response.json());
});

For batching, use relay-runtime's Network.batch:

const network = Network.batch(batchFunction, {
  batchTimeout: 10, // Wait 10ms to collect requests
  maxBatchSize: 5,  // Max 5 queries per batch
});

Garbage Collection and Cache Retention 🗑️

Relay's garbage collector removes records no longer referenced by any active query. This prevents memory leaks in long-running applications.

Retention strategies:

Strategy	Description	Use Case
Query-based	Keep data while query is active	Most components
Manual retention	Explicitly retain records	Global app state
LRU eviction	Remove least recently used	Memory-constrained apps

Configuration example:

const store = new Store(new RecordSource(), {
  gcReleaseBufferSize: 10, // Keep 10 queries worth of data after release
  queryCacheExpirationTime: 5 * 60 * 1000, // 5 minutes
});

Manual retention:

const {environment} = useRelayEnvironment();
const disposable = environment.retain({
  dataID: 'User:123',
  node: UserFragment,
  variables: {},
});

// Later, allow GC
disposable.dispose();

Cache Invalidation Strategies 🚨

Knowing when to invalidate cached data is crucial:

1. Time-based invalidation:

const {data, refetch} = useQuery(MyQuery);

useEffect(() => {
  const interval = setInterval(() => {
    refetch({}, {fetchPolicy: 'network-only'});
  }, 30000); // Refresh every 30s
  return () => clearInterval(interval);
}, [refetch]);

2. Event-based invalidation:

// After a mutation, invalidate related data
commitMutation(environment, {
  mutation: DeletePostMutation,
  variables: {postId},
  updater: (store) => {
    const user = store.get('User:123');
    const posts = user.getLinkedRecords('posts');
    const filtered = posts.filter(p => p.getDataID() !== `Post:${postId}`);
    user.setLinkedRecords(filtered, 'posts');
  },
});

3. Server-driven invalidation:

type Mutation {
  updatePost(input: UpdatePostInput!): UpdatePostPayload
}

type UpdatePostPayload {
  post: Post
  invalidate: [ID!]  # List of IDs to refetch
}

Practical Examples 💼

Example 1: Basic Normalization in Action 🎯

Scenario: A social media feed showing posts from multiple users. Some users appear multiple times.

GraphQL Query:

query FeedQuery {
  feed {
    posts {
      id
      content
      author {
        id
        name
        avatar
      }
      comments {
        id
        text
        author {
          id
          name
        }
      }
    }
  }
}

Without Normalization:

User "Alice" appears 15 times (author of 10 posts, commenter on 5)
Updating Alice's name requires 15 separate updates
Inconsistent UI if updates miss some copies
Memory waste: 15 copies of same user data

With Relay Normalization:

CACHE STRUCTURE:

┌──────────────────────────────────────────┐
│ User:1 → {id: "1", name: "Alice", ...}   │ ← Single record
├──────────────────────────────────────────┤
│ Post:101 → {author: → User:1, ...}       │ ← Reference
│ Post:102 → {author: → User:1, ...}       │ ← Reference
│ Comment:501 → {author: → User:1, ...}    │ ← Reference
│ Comment:502 → {author: → User:1, ...}    │ ← Reference
└──────────────────────────────────────────┘

✅ Alice exists ONCE
✅ All references point to same record
✅ One update propagates everywhere

Code Implementation:

import {graphql, useFragment} from 'react-relay';

function PostItem({postRef}) {
  const post = useFragment(
    graphql`
      fragment PostItem_post on Post {
        id
        content
        author {
          id
          name
          avatar
        }
      }
    `,
    postRef
  );

  return (
    <div>
      <img src={post.author.avatar} />
      <strong>{post.author.name}</strong>
      <p>{post.content}</p>
    </div>
  );
}

When Alice updates her name via a mutation, every PostItem component automatically re-renders with the new name—no props drilling, no manual state management.

Example 2: Handling Mutations with Automatic Updates 🔄

Scenario: User edits their profile. The update should reflect everywhere their profile appears.

Mutation:

mutation UpdateProfileMutation($input: UpdateProfileInput!) {
  updateProfile(input: $input) {
    user {
      id          # Critical: Relay uses this to find cache record
      name
      bio
      avatar
    }
  }
}

React Component:

import {useMutation, graphql} from 'react-relay';

function EditProfile() {
  const [commit, isInFlight] = useMutation(graphql`
    mutation EditProfileMutation($input: UpdateProfileInput!) {
      updateProfile(input: $input) {
        user {
          id
          name
          bio
          avatar
        }
      }
    }
  `);

  const handleSave = (newData) => {
    commit({
      variables: {input: newData},
      // Optimistic update: instant UI feedback
      optimisticResponse: {
        updateProfile: {
          user: {
            id: 'User:123',
            name: newData.name,
            bio: newData.bio,
            avatar: newData.avatar,
          },
        },
      },
    });
  };

  return <form onSubmit={handleSave}>...</form>;
}

What Happens:

Step	Action	Cache State
1	User clicks Save	Original data
2	Optimistic update applies	UI shows new data instantly
3	Network request in flight	Still showing optimistic data
4	Server responds	Replaced with server data
5	All components re-render	Consistent everywhere

Relay automatically:

Finds the User:123 record in cache
Merges new fields (name, bio, avatar)
Notifies all components reading User:123
Triggers re-renders with updated data

💡 Pro Tip: Always return the full updated object from mutations, including the id. This enables automatic cache updates without manual updater functions.

Example 3: Complex Cache Update with Updater Functions 🛠️

Scenario: Adding a new comment to a post. The server returns the comment, but we need to append it to the post's comments connection.

Challenge: The mutation returns a Comment, but we need to update the parent Post's list.

Mutation with Updater:

import {ConnectionHandler} from 'relay-runtime';

const [commit] = useMutation(graphql`
  mutation AddCommentMutation($input: AddCommentInput!) {
    addComment(input: $input) {
      comment {
        id
        text
        createdAt
        author {
          id
          name
        }
      }
    }
  }
`);

const handleSubmit = (text) => {
  commit({
    variables: {
      input: {
        postId: 'Post:456',
        text,
      },
    },
    updater: (store) => {
      // Get the newly created comment from response
      const payload = store.getRootField('addComment');
      const newComment = payload.getLinkedRecord('comment');
      
      // Get the post record
      const post = store.get('Post:456');
      
      // Get the comments connection
      const connection = ConnectionHandler.getConnection(
        post,
        'PostComments_comments' // Connection key from fragment
      );
      
      // Create a new edge
      const edge = ConnectionHandler.createEdge(
        store,
        connection,
        newComment,
        'CommentEdge'
      );
      
      // Insert at the beginning
      ConnectionHandler.insertEdgeBefore(connection, edge);
    },
  });
};

Step-by-Step Breakdown:

Step	Store Method	Purpose
1	`store.getRootField('addComment')`	Access mutation response
2	`payload.getLinkedRecord('comment')`	Extract new comment record
3	`store.get('Post:456')`	Find parent post in cache
4	`ConnectionHandler.getConnection(...)`	Get paginated comments list
5	`ConnectionHandler.createEdge(...)`	Wrap comment in edge node
6	`ConnectionHandler.insertEdgeBefore(...)`	Add to connection start

Visual Representation:

BEFORE MUTATION:

Post:456
  └─ comments (connection)
      ├─ Comment:1
      ├─ Comment:2
      └─ Comment:3

AFTER MUTATION:

Post:456
  └─ comments (connection)
      ├─ Comment:999 ← NEW! Added by updater
      ├─ Comment:1
      ├─ Comment:2
      └─ Comment:3

This manual update is necessary because Relay can't infer where in the list to add the new comment. The updater function provides explicit instructions.

Example 4: Custom Cache Keys for Non-Standard IDs 🔑

Scenario: Legacy API uses username as identifier instead of numeric id.

GraphQL Type:

type User {
  username: String!  # Primary key, not "id"
  displayName: String
  email: String
}

Problem: Relay's default ID extractor looks for id field, finds nothing, and fails to normalize.

Solution: Custom getDataID function.

import {Environment, Network, RecordSource, Store} from 'relay-runtime';

function customGetDataID(fieldValue, typeName) {
  // Handle User type specially
  if (typeName === 'User' && fieldValue.username != null) {
    return `User:${fieldValue.username}`;
  }
  
  // Handle LegacyPost with composite key
  if (typeName === 'LegacyPost' && fieldValue.userId && fieldValue.timestamp) {
    return `LegacyPost:${fieldValue.userId}:${fieldValue.timestamp}`;
  }
  
  // Fall back to default behavior for types with "id"
  if (fieldValue.id != null) {
    return `${typeName}:${fieldValue.id}`;
  }
  
  // No stable ID available
  return null;
}

const environment = new Environment({
  store: new Store(new RecordSource(), {gcReleaseBufferSize: 10}),
  network: Network.create(fetchQuery),
  getDataID: customGetDataID,
});

Result:

Type	Fields	Generated Cache Key
User	{username: "alice"}	User:alice
User	{username: "bob"}	User:bob
LegacyPost	{userId: "alice", timestamp: 1234}	LegacyPost:alice:1234
Comment	{id: "999"}	Comment:999

Now Relay can normalize User records even without an id field!

⚠️ Warning: Custom keys must be stable and globally unique. If a username can change, it's not suitable as a cache key.

Common Mistakes to Avoid ⚠️

Mistake 1: Forgetting to Return `id` in Mutations 🚫

Problem:

mutation UpdateUserMutation($input: UpdateUserInput!) {
  updateUser(input: $input) {
    user {
      # Missing "id" field!
      name
      email
    }
  }
}

Consequence: Relay can't identify which cache record to update. The mutation succeeds on the server, but your UI doesn't update.

Fix:

mutation UpdateUserMutation($input: UpdateUserInput!) {
  updateUser(input: $input) {
    user {
      id    # ✅ Always include id!
      name
      email
    }
  }
}

Mistake 2: Modifying Cache Records Directly 🚫

Problem:

// ❌ WRONG: Direct mutation
const user = store.get('User:123');
user.name = 'New Name'; // This doesn't work!

Why It Fails: Relay's store is immutable. Direct property assignment has no effect.

Fix:

// ✅ RIGHT: Use store methods
const user = store.get('User:123');
user.setValue('New Name', 'name');

Mistake 3: Not Handling Connections Properly 🚫

Problem:

// ❌ WRONG: Treating connection as array
const post = store.get('Post:456');
const comments = post.getLinkedRecords('comments');
comments.push(newComment); // Breaks pagination!

Why It Fails: Relay connections are special structures with edges, cursors, and page info. Direct array manipulation breaks pagination.

Fix:

// ✅ RIGHT: Use ConnectionHandler
const connection = ConnectionHandler.getConnection(
  post,
  'PostComments_comments'
);
const edge = ConnectionHandler.createEdge(
  store,
  connection,
  newComment,
  'CommentEdge'
);
ConnectionHandler.insertEdgeAfter(connection, edge);

Mistake 4: Ignoring Garbage Collection 🚫

Problem: Keeping every query result in memory forever, causing memory leaks in long-running apps.

Consequence: Browser tab crashes after hours of use, especially on memory-constrained devices.

Fix:

// Configure appropriate GC settings
const store = new Store(new RecordSource(), {
  gcReleaseBufferSize: 10, // Adjust based on app needs
  queryCacheExpirationTime: 5 * 60 * 1000, // 5 minutes
});

// Manually dispose of long-running queries
const disposable = environment.retain(operation);
// Later:
disposable.dispose();

Mistake 5: Assuming Immediate Consistency Across Tabs 🚫

Problem: Expecting cache updates in one browser tab to instantly reflect in another tab.

Reality: Each tab has its own Relay environment and cache. They don't communicate by default.

Solutions:

BroadcastChannel API to sync updates across tabs
WebSocket subscriptions to receive server-pushed updates
Periodic polling with refetch()

// Example: Cross-tab synchronization
const channel = new BroadcastChannel('relay-sync');

commitMutation(environment, {
  mutation: UpdateUserMutation,
  onCompleted: (response) => {
    // Notify other tabs
    channel.postMessage({
      type: 'CACHE_UPDATE',
      recordID: response.updateUser.user.id,
      data: response.updateUser.user,
    });
  },
});

channel.onmessage = (event) => {
  if (event.data.type === 'CACHE_UPDATE') {
    // Update this tab's cache
    environment.commitUpdate((store) => {
      const record = store.get(event.data.recordID);
      Object.entries(event.data.data).forEach(([key, value]) => {
        record.setValue(value, key);
      });
    });
  }
};

Key Takeaways 🎯

📋 Quick Reference Card

Normalization	Flattens nested responses into a map of records identified by unique keys
Global ID	Unique identifier (typically `TypeName:DatabaseID`) for cache keys
Cache Consistency	Single source of truth ensures all UI references display the same data
Automatic Updates	Mutations returning `id` field trigger automatic cache merges
Updater Functions	Manual cache manipulation for complex scenarios (lists, connections)
Deduplication	Identical in-flight queries share a single network request
Garbage Collection	Removes unreferenced records to prevent memory leaks
Optimistic Updates	Apply changes instantly before server confirmation for better UX

Core Principles to Remember:

Every object that can be refetched needs a global ID — Implement the Node interface or ensure id fields exist
The cache is immutable — Use store methods (setValue, setLinkedRecord) instead of direct assignment
Normalization eliminates duplication — Each entity exists once; all references point to it
Always return id in mutations — Enables automatic cache updates without manual work
Use ConnectionHandler for lists — Respect pagination structures; don't treat them as plain arrays
Configure garbage collection — Balance memory usage with data availability
Leverage optimistic updates — Provide instant feedback while waiting for server responses

Performance Benefits:

PERFORMANCE GAINS FROM NORMALIZATION

                 Without          With
                 Normalization    Normalization
                 ─────────────    ─────────────
Memory Usage     ███████████      ███ (70% reduction)
Cache Lookups    ████████         ██  (75% faster)
Update Cost      ███████████      █   (91% faster)
Consistency      ❌ Manual        ✅ Automatic

When to Use Custom Strategies:

Custom getDataID: Legacy APIs without standard id fields
Manual updaters: Adding/removing items from lists or connections
Optimistic updates: Actions requiring instant visual feedback
Manual retention: Global data that should survive query unmounting

📚 Further Study

Official Documentation:

Advanced Topics:

Relay Garbage Collection Strategies

📝

Ready to practice?

This lesson has 15 questions to help you learn

Normalization and Cache Mechanics

Normalization and Cache Mechanics in Relay

Welcome to Cache Truth 💾

Core Concepts: The Foundation of Relay's Cache 🏗️

What Is Normalization? 🔍

The Global Object Identification Specification 🌍

Cache Keys and Record Identification 🔑

How Normalization Works: Step-by-Step 📋

Cache Consistency and Updates 🔄

Advanced Cache Mechanics 🔧

Request Deduplication and Batching ⚡

Garbage Collection and Cache Retention 🗑️

Cache Invalidation Strategies 🚨

Practical Examples 💼

Example 1: Basic Normalization in Action 🎯

Example 2: Handling Mutations with Automatic Updates 🔄

Example 3: Complex Cache Update with Updater Functions 🛠️

Example 4: Custom Cache Keys for Non-Standard IDs 🔑

Common Mistakes to Avoid ⚠️

Mistake 1: Forgetting to Return id in Mutations 🚫

Mistake 2: Modifying Cache Records Directly 🚫

Mistake 3: Not Handling Connections Properly 🚫

Mistake 4: Ignoring Garbage Collection 🚫

Mistake 5: Assuming Immediate Consistency Across Tabs 🚫

Key Takeaways 🎯

📋 Quick Reference Card

📚 Further Study

Mistake 1: Forgetting to Return `id` in Mutations 🚫