Lesson 8: Performance & Best Practices
Master SQL optimization: reading EXPLAIN plans, index strategies, avoiding N+1 queries, and writing maintainable code.
Lesson 8: Performance & Best Practices β‘
Introduction π
You've learned how to write SQL queries that work, but do they work efficiently? A query that returns correct results in 5 seconds might return the same data in 50 milliseconds with proper optimization. In production systems handling thousands of requests per second, this difference isn't just noticeableβit's the difference between a responsive application and one that crashes under load.
This lesson teaches you how to think like a database engine, identify performance bottlenecks, and write SQL that scales. You'll learn to read EXPLAIN plans (the database's execution blueprint), understand indexes (the secret to fast lookups), avoid the dreaded N+1 query problem, and recognize common anti-patterns that plague real-world applications.
π‘ Remember: Premature optimization is the root of all evil, but knowing how to optimize when needed is the mark of a professional developer.
Core Concepts π―
1. Understanding EXPLAIN Plans π
Every SQL query goes through a query planner that decides how to execute it. The EXPLAIN command shows you this execution planβthe roadmap the database follows to retrieve your data.
EXPLAIN SELECT * FROM users WHERE email = 'john@example.com';
A typical EXPLAIN output looks like:
Seq Scan on users (cost=0.00..35.50 rows=1 width=100)
Filter: (email = 'john@example.com')
Key metrics to understand:
- Seq Scan (Sequential Scan): The database reads every row in the table. This is slow for large tables.
- Index Scan: The database uses an index to jump directly to relevant rows. Much faster!
- cost: Estimated computational expense (startup cost..total cost)
- rows: Estimated number of rows returned
- width: Average row size in bytes
π§ Mnemonic: "Seq Scan = Slow Search" (checks every row)
Different scan types:
+------------------+---------------------------+
| Scan Type | When It Happens |
+------------------+---------------------------+
| Seq Scan | No index, small table |
| Index Scan | Using index, few rows |
| Index Only Scan | All data in index itself |
| Bitmap Scan | Multiple index conditions |
| Nested Loop | JOIN with small dataset |
| Hash Join | JOIN with larger dataset |
| Merge Join | JOIN on sorted data |
+------------------+---------------------------+
β οΈ EXPLAIN vs EXPLAIN ANALYZE: EXPLAIN shows the plan, EXPLAIN ANALYZE actually runs the query and shows real timing. Use ANALYZE on development/staging onlyβit executes the query!
2. Index Strategies π
An index is like a book's table of contentsβit lets you jump to the right page instead of reading every page sequentially.
How indexes work conceptually:
Without Index: With Index on email:
Table scan every row B-tree lookup
β β
Row 1: alice@... 'john@...' β Row 4523
Row 2: bob@...
Row 3: charlie@...
...
Row 4523: john@... β
...
Row 10000: zoe@...
Time: O(n) Time: O(log n)
Creating indexes:
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);
CREATE UNIQUE INDEX idx_users_username ON users(username);
When to index:
β DO index:
- Columns in WHERE clauses
- Foreign keys used in JOINs
- Columns in ORDER BY
- Columns in GROUP BY
- Frequently searched fields
β DON'T index:
- Small tables (< 1000 rows)
- Columns with low cardinality (few unique values like boolean)
- Columns that change frequently
- Tables with heavy INSERT/UPDATE load
π€ Did you know? Indexes speed up reads but slow down writes! Each INSERT/UPDATE/DELETE must update all relevant indexes. This is the classic read-vs-write tradeoff.
Composite indexes (multi-column) follow the leftmost prefix rule:
INDEX on (user_id, created_at, status)
β
Can use for:
WHERE user_id = 5
WHERE user_id = 5 AND created_at > '2024-01-01'
WHERE user_id = 5 AND created_at > '2024-01-01' AND status = 'active'
β Cannot use for:
WHERE created_at > '2024-01-01' (doesn't start with user_id)
WHERE status = 'active' (skips first columns)
3. The N+1 Query Problem π
This is the most common performance killer in applications. Here's the scenario:
-- Query 1: Get all users
SELECT * FROM users LIMIT 10;
-- Then in your application loop:
for each user:
-- Query 2, 3, 4... 11: Get each user's orders
SELECT * FROM orders WHERE user_id = ?
The problem: You execute 1 query to get users, then N additional queries (one per user). For 10 users, that's 11 queries. For 1000 users, that's 1001 queries! π±
The solution: Use a JOIN or IN clause to fetch everything in one query:
-- Solution 1: JOIN
SELECT users.*, orders.*
FROM users
LEFT JOIN orders ON users.id = orders.user_id
LIMIT 10;
-- Solution 2: IN clause with subquery
SELECT * FROM users WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
SELECT * FROM orders WHERE user_id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10);
Visual comparison:
N+1 Problem: Optimized:
[App] [App]
β β
Query 1 β [DB] Query 1 (JOIN) β [DB]
β β
Query 2 β [DB] [All data returned]
β
Query 3 β [DB]
β
...
β
Query N β [DB]
Time: N * latency Time: 1 * latency
π§ Mnemonic: "N+1 = Never do 1 query per row!"
4. Common Anti-Patterns β οΈ
**Anti-Pattern #1: SELECT ***
-- BAD: Fetches all columns
SELECT * FROM users WHERE id = 5;
-- GOOD: Only fetch what you need
SELECT id, name, email FROM users WHERE id = 5;
Why it matters: SELECT * transfers unnecessary data over the network, prevents covering indexes (indexes that contain all needed columns), and breaks when schema changes.
Anti-Pattern #2: Functions on Indexed Columns
-- BAD: Function prevents index usage
SELECT * FROM users WHERE UPPER(email) = 'JOHN@EXAMPLE.COM';
-- GOOD: Store normalized data or use functional index
SELECT * FROM users WHERE email = 'john@example.com';
When you apply a function to an indexed column, the database can't use the index!
Anti-Pattern #3: OR in WHERE Clauses
-- BAD: OR often prevents index usage
SELECT * FROM products WHERE category = 'electronics' OR price < 100;
-- GOOD: Use UNION when possible
SELECT * FROM products WHERE category = 'electronics'
UNION
SELECT * FROM products WHERE price < 100;
Anti-Pattern #4: LIKE with Leading Wildcard
-- BAD: Cannot use index
SELECT * FROM users WHERE email LIKE '%@example.com';
-- GOOD: Trailing wildcard CAN use index
SELECT * FROM users WHERE email LIKE 'john%';
Anti-Pattern #5: NOT IN with NULLs
-- DANGEROUS: NOT IN returns no rows if subquery contains NULL
SELECT * FROM users WHERE id NOT IN (SELECT user_id FROM banned);
-- If banned.user_id has any NULL, this returns ZERO rows!
-- SAFE: Use NOT EXISTS
SELECT * FROM users u
WHERE NOT EXISTS (SELECT 1 FROM banned b WHERE b.user_id = u.id);
5. Writing Maintainable SQL π
Use meaningful aliases:
-- BAD
SELECT u.*, o.* FROM users u JOIN orders o ON u.id = o.user_id;
-- GOOD
SELECT
users.id,
users.name,
orders.total,
orders.created_at
FROM users
JOIN orders ON users.id = orders.user_id;
Format for readability:
SELECT
u.id,
u.name,
COUNT(o.id) AS order_count,
SUM(o.total) AS total_spent
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at >= '2024-01-01'
GROUP BY u.id, u.name
HAVING COUNT(o.id) > 5
ORDER BY total_spent DESC
LIMIT 10;
Use comments for complex logic:
-- Find users who placed orders in the last 30 days
-- but haven't placed one in the last 7 days (re-engagement target)
SELECT DISTINCT user_id
FROM orders
WHERE created_at >= CURRENT_DATE - INTERVAL '30 days'
AND user_id NOT IN (
SELECT user_id FROM orders
WHERE created_at >= CURRENT_DATE - INTERVAL '7 days'
);
Detailed Examples π»
Example 1: Optimizing a Slow Query
Initial query (slow on large table):
SELECT * FROM orders
WHERE YEAR(created_at) = 2024 AND status = 'completed';
Step 1: Check EXPLAIN plan
EXPLAIN SELECT * FROM orders
WHERE YEAR(created_at) = 2024 AND status = 'completed';
-- Result: Seq Scan (bad!)
-- Problem: YEAR() function prevents index usage
Step 2: Rewrite without function
SELECT id, user_id, total, created_at
FROM orders
WHERE created_at >= '2024-01-01'
AND created_at < '2025-01-01'
AND status = 'completed';
Step 3: Add composite index
CREATE INDEX idx_orders_created_status ON orders(created_at, status);
Step 4: Verify improvement
EXPLAIN ANALYZE SELECT id, user_id, total, created_at
FROM orders
WHERE created_at >= '2024-01-01'
AND created_at < '2025-01-01'
AND status = 'completed';
-- Result: Index Scan using idx_orders_created_status
-- Execution time: 2.3ms (was 450ms)
Optimization summary:
+-------------------------+------------------+
| Change | Impact |
+-------------------------+------------------+
| Removed YEAR() | Index usable |
| Removed SELECT * | Less data xfer |
| Added composite index | Fast lookup |
+-------------------------+------------------+
Result: 195x faster! β‘
Example 2: Solving N+1 in an E-Commerce App
Scenario: Display 50 products with their category names.
β N+1 Problem Code:
-- Query 1: Get products
SELECT id, name, category_id, price FROM products LIMIT 50;
-- Application loop (executes 50 times!):
for each product:
SELECT name FROM categories WHERE id = product.category_id;
Total: 51 queries (1 + 50)
β Optimized with JOIN:
SELECT
p.id,
p.name AS product_name,
p.price,
c.name AS category_name
FROM products p
JOIN categories c ON p.category_id = c.id
LIMIT 50;
Total: 1 query!
Performance comparison:
N+1 Approach:
51 queries Γ 5ms latency = 255ms
JOIN Approach:
1 query Γ 8ms = 8ms
Speedup: 31x faster! π
Example 3: Index Strategy for Search Feature
Requirement: Users search products by name and filter by category and price range.
Typical queries:
-- Search by name
SELECT * FROM products WHERE name LIKE 'laptop%';
-- Filter by category
SELECT * FROM products WHERE category_id = 5;
-- Filter by price range
SELECT * FROM products WHERE price BETWEEN 100 AND 500;
-- Combined search
SELECT * FROM products
WHERE name LIKE 'laptop%'
AND category_id = 5
AND price BETWEEN 100 AND 500;
Index strategy:
-- Index 1: For name searches
CREATE INDEX idx_products_name ON products(name);
-- Supports: WHERE name LIKE 'laptop%' (but NOT '%laptop%')
-- Index 2: For category filtering
CREATE INDEX idx_products_category ON products(category_id);
-- Index 3: Composite for common combined search
CREATE INDEX idx_products_cat_price ON products(category_id, price);
-- Supports: WHERE category_id = ? AND price BETWEEN ? AND ?
Why not one huge composite index?
CREATE INDEX idx_products_all ON products(name, category_id, price);
This would only help queries that filter by name first (leftmost prefix rule). Our separate indexes give flexibility for different query patterns.
Example 4: Avoiding Anti-Patterns in Reports
β BAD: Multiple scans and functions
SELECT
(SELECT COUNT(*) FROM orders WHERE MONTH(created_at) = 1) AS jan_orders,
(SELECT COUNT(*) FROM orders WHERE MONTH(created_at) = 2) AS feb_orders,
(SELECT COUNT(*) FROM orders WHERE MONTH(created_at) = 3) AS mar_orders,
(SELECT COUNT(*) FROM orders WHERE MONTH(created_at) = 4) AS apr_orders
FROM orders
LIMIT 1;
Problems:
- 4 separate table scans
- MONTH() prevents index usage
- Scans entire table for each subquery
β GOOD: Single scan with CASE
SELECT
COUNT(CASE WHEN created_at >= '2024-01-01' AND created_at < '2024-02-01' THEN 1 END) AS jan_orders,
COUNT(CASE WHEN created_at >= '2024-02-01' AND created_at < '2024-03-01' THEN 1 END) AS feb_orders,
COUNT(CASE WHEN created_at >= '2024-03-01' AND created_at < '2024-04-01' THEN 1 END) AS mar_orders,
COUNT(CASE WHEN created_at >= '2024-04-01' AND created_at < '2024-05-01' THEN 1 END) AS apr_orders
FROM orders
WHERE created_at >= '2024-01-01' AND created_at < '2024-05-01';
Benefits:
- Single table scan
- Index on created_at usable
- Filters only relevant date range
Common Mistakes β οΈ
β Mistake 1: Over-Indexing
-- Creating indexes on everything
CREATE INDEX idx1 ON products(name);
CREATE INDEX idx2 ON products(price);
CREATE INDEX idx3 ON products(category_id);
CREATE INDEX idx4 ON products(stock);
CREATE INDEX idx5 ON products(created_at);
-- ... and 10 more!
Why it's bad: Each index slows down INSERT/UPDATE/DELETE operations. Only index what you actually query frequently.
β Better approach: Start with no indexes, add them based on actual slow queries found in production logs or EXPLAIN analysis.
β Mistake 2: Using EXPLAIN Without ANALYZE
EXPLAIN SELECT * FROM huge_table WHERE complex_condition = 'value';
Why it's bad: EXPLAIN shows estimates, but doesn't show actual execution time or if statistics are outdated.
β
Better approach: Use EXPLAIN ANALYZE on non-production or with LIMIT:
EXPLAIN ANALYZE SELECT * FROM huge_table WHERE complex_condition = 'value' LIMIT 100;
β Mistake 3: Ignoring Query Context
-- Optimizing a query that runs once per day
SPEND 3 HOURS β Save 200ms
-- Ignoring a query that runs 10,000 times per minute
IGNORE β Losing 500ms Γ 10,000 Γ 60 = 5,000,000ms = 83 minutes wasted/hour!
β Better approach: Profile your application! Focus on:
- Queries executed most frequently
- Queries taking longest individually
- Queries on tables growing rapidly
β Mistake 4: Premature Optimization
-- Adding indexes before having data
CREATE TABLE users (id INT PRIMARY KEY, name VARCHAR(100));
CREATE INDEX idx_users_name ON users(name);
-- Table has 10 rows total...
Why it's bad: Small tables don't benefit from indexes. The overhead hurts more than it helps.
β Better approach:
- Write correct SQL first
- Test with realistic data volumes
- Measure actual performance
- Optimize only proven bottlenecks
β Mistake 5: Forgetting About NULL
-- Assuming this returns all non-admin users
SELECT * FROM users WHERE role != 'admin';
-- WRONG! Excludes users where role IS NULL
β Better approach:
SELECT * FROM users WHERE role != 'admin' OR role IS NULL;
-- Or use COALESCE:
SELECT * FROM users WHERE COALESCE(role, 'user') != 'admin';
Key Takeaways π―
EXPLAIN is your friend: Always check execution plans for slow queries. Look for Seq Scans on large tables.
Indexes are not free: They speed up reads but slow down writes. Index strategically based on query patterns.
N+1 kills performance: Always fetch related data in one query using JOINs or IN clauses, not in application loops.
SELECT only what you need: Avoid
SELECT *. Be explicit about columns.Functions break indexes:
WHERE YEAR(date) = 2024can't use an index. UseWHERE date >= '2024-01-01' AND date < '2025-01-01'.Leftmost prefix rule: Composite index on (a, b, c) works for queries on (a), (a,b), (a,b,c) but NOT (b), (c), or (b,c).
LIKE patterns:
name LIKE 'john%'can use index,name LIKE '%john%'cannot.NULL behavior: Remember that
WHERE column != 'value'excludes NULL rows. Test with NULL data!Measure, don't guess: Use
EXPLAIN ANALYZEand production metrics to find real bottlenecks.Maintainable > Clever: Write clear, readable SQL. Future you (and your teammates) will thank you.
π§ Try This: Optimization Exercise
Given this slow query:
SELECT * FROM orders
WHERE DATE(created_at) = CURRENT_DATE
AND (status = 'pending' OR status = 'processing');
Identify three problems and how you'd fix them:
π‘ Solution (click to reveal)
- **SELECT ***: Only select needed columns
- DATE() function: Prevents index usage, use range instead
- OR clause: Consider UNION or IN clause
Optimized:
SELECT id, user_id, total, status
FROM orders
WHERE created_at >= CURRENT_DATE
AND created_at < CURRENT_DATE + INTERVAL '1 day'
AND status IN ('pending', 'processing');
Add index: CREATE INDEX idx_orders_date_status ON orders(created_at, status);
π Further Study
- Use The Index, Luke - https://use-the-index-luke.com/ - Excellent visual guide to database indexing
- PostgreSQL EXPLAIN Documentation - https://www.postgresql.org/docs/current/using-explain.html - Official guide to reading query plans
- SQL Performance Explained (Book) - https://sql-performance-explained.com/ - Deep dive into optimization across different databases
π Quick Reference Card
+----------------------------------+---------------------------------------+
| CONCEPT | KEY POINTS |
+----------------------------------+---------------------------------------+
| EXPLAIN | Shows execution plan |
| | Look for: Seq Scan (bad on big table)|
| | Use EXPLAIN ANALYZE for actual times |
+----------------------------------+---------------------------------------+
| Indexes | B-tree structure for fast lookups |
| | Speed reads, slow writes |
| | Index WHERE/JOIN/ORDER BY columns |
| | Don't over-index! |
+----------------------------------+---------------------------------------+
| Composite Index (a,b,c) | Works for: (a), (a,b), (a,b,c) |
| | Doesn't work for: (b), (c), (b,c) |
| | Leftmost prefix rule! |
+----------------------------------+---------------------------------------+
| N+1 Problem | 1 query + N queries in loop |
| | Solution: Use JOIN or IN clause |
| | Fetch related data in one query |
+----------------------------------+---------------------------------------+
| Anti-Patterns to Avoid | SELECT * |
| | Functions on indexed columns |
| | OR in WHERE (often) |
| | LIKE '%prefix' (leading wildcard) |
| | NOT IN with possible NULLs |
+----------------------------------+---------------------------------------+
| Optimization Workflow | 1. Measure (find slow queries) |
| | 2. EXPLAIN (understand plan) |
| | 3. Fix (rewrite or add index) |
| | 4. Verify (EXPLAIN ANALYZE) |
+----------------------------------+---------------------------------------+
| SQL Formatting | Meaningful aliases |
| | One clause per line |
| | Comments for complex logic |
| | Explicit column names |
+----------------------------------+---------------------------------------+
Remember: Correct first, fast second. A slow query that returns correct data is better than a fast query that returns wrong data! β‘