Identity and Cache Truth

Master Relay's normalization system and understand how object identity drives cache consistency

Identity and Cache Truth in Relay

Master Relay's identity and cache truth concepts with free flashcards and spaced repetition practice. This lesson covers global object identification, normalization strategies, and cache consistency patterns—essential concepts for building performant GraphQL applications with Relay.

Welcome to Identity and Cache Truth

💻 When building applications with Relay, understanding how data is stored, identified, and kept consistent is crucial. Relay's approach to caching and data management sets it apart from other GraphQL clients, offering automatic normalization and a sophisticated system for ensuring your cache represents the truth about your application's data state.

Think of Relay's cache as a single source of truth for all your data—like a well-organized library where every book has a unique catalog number. When you request the same book from different sections, you always get the exact same copy, never duplicates. This lesson will teach you how Relay achieves this through its identity system and cache management strategies.

Core Concepts

🎯 Global Object Identification

Relay requires that every object in your GraphQL schema that can be refetched has a globally unique identifier. This is the cornerstone of Relay's normalization strategy.

The id Field Convention

Every type that implements the Node interface must have an id field that is:

Globally unique across your entire schema
Opaque (clients shouldn't parse or construct IDs)
Stable (the same object always has the same ID)

interface Node {
  id: ID!
}

type User implements Node {
  id: ID!
  name: String!
  email: String!
}

type Post implements Node {
  id: ID!
  title: String!
  author: User!
}

💡 Best Practice: Use base64-encoded strings that include the typename, like "User:123" encoded as "VXNlcjoxMjM=". This makes debugging easier while keeping IDs opaque to clients.

Why Global IDs Matter

When Relay fetches a User with id: "VXNlcjoxMjM=" from one query and the same user from another query, it recognizes they're the same object and merges the data automatically:

Query 1 Result	Query 2 Result	Merged Cache Entry
{ id: "VXNlcjoxMjM=", name: "Alice" }	{ id: "VXNlcjoxMjM=", email: "alice@example.com" }	{ id: "VXNlcjoxMjM=", name: "Alice", email: "alice@example.com" }

📦 Normalization: The Heart of Relay's Cache

Normalization is the process of flattening nested GraphQL responses into a flat lookup table indexed by global IDs. This eliminates data duplication and ensures consistency.

Before Normalization (Denormalized)

{
  "viewer": {
    "name": "Alice",
    "posts": [
      {
        "id": "post1",
        "title": "Hello World",
        "author": {
          "id": "user123",
          "name": "Alice"
        }
      }
    ]
  },
  "post": {
    "id": "post1",
    "title": "Hello World",
    "author": {
      "id": "user123",
      "name": "Alice"
    }
  }
}

Notice how Alice and "Hello World" appear multiple times? If Alice changes her name, we'd need to update it in multiple places.

After Normalization (Relay's Store)

{
  "user123": {
    "__typename": "User",
    "id": "user123",
    "name": "Alice"
  },
  "post1": {
    "__typename": "Post",
    "id": "post1",
    "title": "Hello World",
    "author": {"__ref": "user123"}
  },
  "client:root": {
    "viewer": {"__ref": "user123"},
    "post": {"__ref": "post1"}
  }
}

Now each object exists exactly once. The author field doesn't contain the full user object—it contains a reference ({"__ref": "user123"}) pointing to the normalized record.

🧠 Mental Model: Think of normalization like database normalization. Instead of storing the entire customer record with every order, you store the customer once and reference it by ID from each order.

🔄 Cache as Truth: The Single Source Principle

In Relay, the store (cache) is the source of truth for your UI. Components don't hold their own copies of data—they read from and subscribe to the centralized store.

The Data Flow

┌──────────────────────────────────────────────────┐
│              RELAY DATA FLOW                     │
└──────────────────────────────────────────────────┘

  1. Component renders
       │
       ↓
  2. Relay reads from Store (cache)
       │
       ↓
  3. Is data available?
       │
  ┌────┴────┐
  ↓         ↓
 YES       NO
  │         │
  │         ↓
  │    4. Fetch from network
  │         │
  │         ↓
  │    5. Normalize response
  │         │
  │         ↓
  │    6. Update Store
  │         │
  └────┬────┘
       │
       ↓
  7. Notify subscribed components
       │
       ↓
  8. Components re-render with new data

Benefits of Cache as Truth

✅ Consistency: When data updates in the store, ALL components using that data automatically see the update

✅ Efficiency: No duplicate data in memory

✅ Automatic updates: Update a user's name in one place, and it updates everywhere it's displayed

✅ Optimistic updates: You can immediately update the cache before the server responds, making your UI feel instant

🔍 Cache Policies and Data Freshness

Relay provides several strategies for determining whether cached data is "fresh enough" or needs to be refetched.

Fetch Policies

Policy	Behavior	Use Case
`store-or-network`	Use cache if available, otherwise fetch	Default - balanced approach
`store-and-network`	Use cache immediately, then fetch to update	Show something fast, then refresh
`network-only`	Always fetch from network, ignore cache	Critical real-time data
`store-only`	Only use cache, never fetch	Offline mode, static data

const data = useLazyLoadQuery(
  graphql`
    query UserProfileQuery($id: ID!) {
      user(id: $id) {
        name
        email
      }
    }
  `,
  {id: userId},
  {fetchPolicy: 'store-and-network'} // Render cache, then update
);

💡 Pro Tip: Use store-and-network for lists and feeds where you want to show cached content immediately but also want fresh data. Use network-only sparingly—it defeats the purpose of caching.

🎨 Cache Updates: Mutations and the Store

When you perform a mutation (like creating, updating, or deleting data), Relay needs to update its cache to reflect the changes.

Automatic Updates

If your mutation returns an object with an id, Relay automatically updates that record in the cache:

mutation UpdateUserMutation($input: UpdateUserInput!) {
  updateUser(input: $input) {
    user {
      id          # Relay uses this to find the record
      name        # These fields get updated
      email
    }
  }
}

Relay sees user.id, finds the existing User record in the cache with that ID, and merges the new fields. Every component displaying that user automatically re-renders with the updated data.

Manual Cache Updates with Updater Functions

For more complex scenarios (like adding items to a list), you need an updater function:

const [commitMutation] = useMutation(graphql`
  mutation CreatePostMutation($input: CreatePostInput!) {
    createPost(input: $input) {
      post {
        id
        title
        author {
          id
        }
      }
    }
  }
`);

function createPost(title) {
  commitMutation({
    variables: {input: {title}},
    updater: (store) => {
      // Get the new post from the mutation response
      const newPost = store.getRootField('createPost').getLinkedRecord('post');
      
      // Get the current user's record
      const user = store.get(currentUserId);
      
      // Get the existing posts connection
      const posts = user.getLinkedRecords('posts');
      
      // Add the new post to the beginning
      user.setLinkedRecords([newPost, ...posts], 'posts');
    }
  });
}

The updater function receives a store object that lets you imperatively modify the cache.

⚡ Garbage Collection and Cache Retention

Relay doesn't keep everything in memory forever. It uses garbage collection to remove data that's no longer being used.

Reference Counting

Relay tracks how many components are using each piece of data:

┌────────────────────────────────────────┐
│     USER RECORD: user123               │
│     Reference Count: 2                 │
├────────────────────────────────────────┤
│  Referenced by:                        │
│  • ProfilePage component               │
│  • HeaderUserMenu component            │
└────────────────────────────────────────┘

  When both components unmount:
       ↓
  Reference count → 0
       ↓
  After GC timeout (default: 10s)
       ↓
  Record eligible for deletion
       ↓
  Freed from memory

Retention Strategies

// Keep data for 60 seconds after component unmounts
const data = useLazyLoadQuery(
  query,
  variables,
  {fetchPolicy: 'store-or-network'}
);

// Manual retention - prevent GC
const {environment} = useRelayEnvironment();
const disposable = environment.retain(query, variables);

// Later: allow GC
disposable.dispose();

💡 Best Practice: Let Relay handle GC automatically in most cases. Use manual retention only for data you know you'll need soon (like prefetching for the next page).

Detailed Examples

Example 1: Identity Collision and Resolution

Scenario: You fetch the same user from two different queries with different fields.

// Query 1: Get basic user info
const data1 = useLazyLoadQuery(
  graphql`
    query Example1_BasicQuery($id: ID!) {
      user(id: $id) {
        id
        name
      }
    }
  `,
  {id: 'user123'}
);

// Later... Query 2: Get user with email
const data2 = useLazyLoadQuery(
  graphql`
    query Example1_DetailQuery($id: ID!) {
      user(id: $id) {
        id
        email
        profilePicture
      }
    }
  `,
  {id: 'user123'}
);

What happens in the cache:

Step	Cache State	Explanation
1	{ "user123": { "id": "user123", "name": "Alice" } }	First query stores basic info
2	{ "user123": { "id": "user123", "name": "Alice", "email": "alice@example.com", "profilePicture": "url..." } }	Second query merges new fields

Relay merges the data because both queries reference the same id. The cache now contains all fields from both queries. If a third component queries just name, Relay serves it from cache without a network request.

🧠 Key Insight: This is why global IDs are so powerful. Relay automatically deduplicates and consolidates data across your entire application.

Example 2: Cache Invalidation with Updates

Scenario: A user updates their profile, and you want all components displaying that user to update immediately.

// Component A: Profile page
function ProfilePage({userId}) {
  const data = useLazyLoadQuery(
    graphql`
      query ProfilePageQuery($id: ID!) {
        user(id: $id) {
          id
          name
          bio
        }
      }
    `,
    {id: userId}
  );
  
  return (
    <div>
      <h1>{data.user.name}</h1>
      <p>{data.user.bio}</p>
    </div>
  );
}

// Component B: Header (different part of UI)
function Header({userId}) {
  const data = useLazyLoadQuery(
    graphql`
      query HeaderQuery($id: ID!) {
        user(id: $id) {
          id
          name
        }
      }
    `,
    {id: userId}
  );
  
  return <div>Welcome, {data.user.name}!</div>;
}

// Mutation: Update profile
function EditProfileForm({userId}) {
  const [commit] = useMutation(graphql`
    mutation UpdateProfileMutation($input: UpdateUserInput!) {
      updateUser(input: $input) {
        user {
          id
          name
          bio
        }
      }
    }
  `);
  
  function handleSubmit(newName, newBio) {
    commit({
      variables: {
        input: {id: userId, name: newName, bio: newBio}
      }
      // No updater needed! Relay handles it automatically
    });
  }
  
  return <form onSubmit={handleSubmit}>...</form>;
}

What happens:

User submits form
Mutation executes and returns updated user with id: "user123"
Relay finds the user123 record in cache
Updates name and bio fields
Both ProfilePage and Header automatically re-render with new name
User sees instant updates everywhere

💡 Why this works: Because both components use the same user(id: "user123"), they share the same cache entry. When that entry updates, Relay notifies all subscribers.

Example 3: Optimistic Updates for Instant UI

Scenario: When a user likes a post, you want the UI to update instantly without waiting for the server.

function LikeButton({postId, currentLikeCount, viewerHasLiked}) {
  const [commit, isInFlight] = useMutation(graphql`
    mutation LikePostMutation($input: LikePostInput!) {
      likePost(input: $input) {
        post {
          id
          likeCount
          viewerHasLiked
        }
      }
    }
  `);
  
  function handleLike() {
    commit({
      variables: {input: {postId}},
      
      // Optimistic response - applied immediately
      optimisticResponse: {
        likePost: {
          post: {
            id: postId,
            likeCount: currentLikeCount + 1,
            viewerHasLiked: true
          }
        }
      },
      
      // Optional: handle server response different from optimistic
      onCompleted: (response) => {
        // Server confirmed the like
        console.log('Like confirmed');
      },
      
      onError: (error) => {
        // Server rejected - Relay automatically rolls back optimistic update
        console.error('Like failed', error);
      }
    });
  }
  
  return (
    <button onClick={handleLike} disabled={isInFlight}>
      {viewerHasLiked ? '❤️' : '🤍'} {currentLikeCount}
    </button>
  );
}

Timeline of events:

┌─────────────────────────────────────────────────────┐
│         OPTIMISTIC UPDATE TIMELINE                  │
└─────────────────────────────────────────────────────┘

  t=0ms: User clicks button
         │
         ↓
  t=1ms: Optimistic response applied to cache
         │
         ↓
  t=2ms: UI re-renders (likeCount: 42 → 43)
         │                 (viewerHasLiked: false → true)
         │
         ↓
  t=5ms: Network request sent to server
         │
         │ ... network latency ...
         │
         ↓
  t=150ms: Server responds with actual data
         │
         ↓
  t=151ms: Cache updated with real response
         │          (optimistic update replaced)
         │
         ↓
  t=152ms: UI re-renders if data differs

If the server response differs:

// Optimistic: likeCount = 43
// Server says: likeCount = 44 (someone else liked it too)
// Result: UI shows 44, not 43

Relay automatically replaces optimistic data with real server data, ensuring truth.

⚠️ Common Mistake: Making optimistic responses too complex. Keep them simple and only update fields you're certain about. Let the server response be the final truth.

Example 4: List Operations with Connections

Scenario: Adding a new comment to a post's comment list.

function AddCommentForm({postId}) {
  const [commit] = useMutation(graphql`
    mutation AddCommentMutation($input: AddCommentInput!) {
      addComment(input: $input) {
        commentEdge {
          node {
            id
            text
            author {
              id
              name
            }
            createdAt
          }
        }
      }
    }
  `);
  
  function handleSubmit(text) {
    commit({
      variables: {input: {postId, text}},
      
      updater: (store) => {
        // Get the post record from cache
        const post = store.get(postId);
        if (!post) return;
        
        // Get the new comment from mutation response
        const commentEdge = store
          .getRootField('addComment')
          .getLinkedRecord('commentEdge');
        
        // Get existing comments connection
        const connection = post.getLinkedRecord('comments');
        if (!connection) return;
        
        // Get current edges array
        const edges = connection.getLinkedRecords('edges') || [];
        
        // Prepend new comment to the list
        connection.setLinkedRecords(
          [commentEdge, ...edges],
          'edges'
        );
        
        // Update total count
        const count = connection.getValue('totalCount') || 0;
        connection.setValue(count + 1, 'totalCount');
      }
    });
  }
  
  return <form onSubmit={handleSubmit}>...</form>;
}

Why manual update is needed:

Relay can't automatically know where to insert the new comment in the list. You must tell it:

Which connection to update (post.comments)
Where in the list to add it (beginning, end, or specific position)
How to update metadata (like totalCount)

💡 Pro Tip: For simple list appends, consider using Relay's @appendNode or @prependNode directives in your schema design to avoid manual updaters.

Common Mistakes

❌ Mistake 1: Forgetting to Fetch the `id` Field

// WRONG - no id field
const data = useLazyLoadQuery(
  graphql`
    query BadQuery($userId: ID!) {
      user(id: $userId) {
        name
        email
      }
    }
  `,
  {userId}
);

Problem: Relay can't normalize the user record without an id. The data will be stored under the query root, not as a reusable record.

Fix: Always include id for types implementing Node:

// CORRECT
const data = useLazyLoadQuery(
  graphql`
    query GoodQuery($userId: ID!) {
      user(id: $userId) {
        id          # ✅ Always include id
        name
        email
      }
    }
  `,
  {userId}
);

❌ Mistake 2: Mutating Cache Data Directly

// WRONG - direct mutation
const data = useLazyLoadQuery(query, variables);
data.user.name = 'New Name'; // ❌ This won't update the cache!

Problem: Relay data is read-only. Direct mutations don't trigger updates or re-renders.

Fix: Use mutations or updater functions:

// CORRECT
const [commit] = useMutation(updateUserMutation);
commit({
  variables: {input: {id: userId, name: 'New Name'}}
});

❌ Mistake 3: Incorrect Optimistic Response Structure

// WRONG - mismatched structure
commit({
  variables: {input: {postId}},
  optimisticResponse: {
    likeCount: 43  // ❌ Doesn't match mutation shape
  }
});

Problem: Optimistic response must exactly match the mutation's response shape.

Fix: Mirror the mutation response structure:

// CORRECT
commit({
  variables: {input: {postId}},
  optimisticResponse: {
    likePost: {           // ✅ Matches mutation field
      post: {             // ✅ Matches nested structure
        id: postId,
        likeCount: 43
      }
    }
  }
});

❌ Mistake 4: Not Handling GC for Prefetched Data

// WRONG - prefetch without retention
function prefetchNextPage() {
  fetchQuery(environment, nextPageQuery, variables);
  // Data will be GC'd before user navigates!
}

Problem: Prefetched data is garbage collected if no component references it.

Fix: Retain the query:

// CORRECT
function prefetchNextPage() {
  const disposable = fetchQuery(environment, nextPageQuery, variables).subscribe({});
  
  // Keep for 30 seconds
  setTimeout(() => disposable.dispose(), 30000);
  
  return disposable;
}

❌ Mistake 5: Over-relying on `network-only`

// WRONG - unnecessary network requests
const data = useLazyLoadQuery(
  query,
  variables,
  {fetchPolicy: 'network-only'} // ❌ Ignores perfectly good cache
);

Problem: Defeats caching, causes unnecessary load, slower UI.

Fix: Use appropriate fetch policy:

// CORRECT - use cache intelligently
const data = useLazyLoadQuery(
  query,
  variables,
  {fetchPolicy: 'store-and-network'} // ✅ Show cache, then refresh
);

Key Takeaways

🎯 Identity is Everything: Global IDs enable Relay's entire normalization system. Every refetchable object needs a unique id.

🎯 One Record, One Truth: Normalized cache means each object exists exactly once, eliminating duplication and inconsistency.

🎯 Automatic is Better: Relay automatically merges data, updates components, and handles most cache operations—let it do its job.

🎯 Cache as Source of Truth: Your components read from the store, not from local state. The store is the single source of truth.

🎯 Smart Fetching: Choose the right fetch policy for each use case. Default to store-or-network and only deviate with good reason.

🎯 Optimistic Updates for UX: Use optimistic responses to make your UI feel instant, but keep them simple and let server data override.

🎯 Manual Updates When Needed: For list operations and complex cache changes, use updater functions to explicitly modify the store.

🎯 GC is Your Friend: Let Relay clean up unused data automatically. Manually retain only when prefetching or caching for known future use.

🤔 Did You Know?

Relay's normalization strategy is inspired by database normalization principles from the 1970s. The same concepts that prevent data anomalies in SQL databases (1NF, 2NF, 3NF) apply to Relay's cache—one source of truth for each entity!

Facebook (now Meta) built Relay to handle their massive scale: millions of objects, thousands of components, all sharing and updating the same data. The identity system makes it possible to have a single "User" record that's referenced by posts, comments, likes, friend lists, and more—all staying perfectly in sync.

📋 Quick Reference Card

Concept	Key Point
Global ID	Unique identifier for every Node type object
Normalization	Flattening nested data into ID-indexed lookup table
Store	Relay's cache - single source of truth for all data
Reference	`{"__ref": "id"}` pointer to normalized record
Fetch Policy	Strategy for cache vs network (store-or-network, etc.)
Updater	Function to manually modify cache after mutations
Optimistic Update	Instant UI update before server confirms
GC	Automatic cleanup of unreferenced cached data
Retention	Keeping data in cache even when not actively used

Cache Update Flow:

Mutation → Server Response → Normalize → Update Store → Notify Subscribers → Re-render

Always Include:

id field in queries for Node types
__typename for union/interface types (Relay adds automatically)
Proper error handling for mutations

📚 Further Study

Relay Documentation - Guided Tour: https://relay.dev/docs/guided-tour/ - Official comprehensive guide to Relay concepts
GraphQL Global Object Identification Specification: https://graphql.org/learn/global-object-identification/ - The spec behind Relay's ID system
Relay Store API Reference: https://relay.dev/docs/api-reference/store/ - Detailed documentation on cache manipulation and updater functions

📝

Ready to practice?

This lesson has 15 questions to help you learn