Lesson 2: Aggregations — Summarizing Data
Learn to summarize and analyze data using COUNT, SUM, AVG, MIN, MAX, GROUP BY, and HAVING. Master the difference between WHERE and HAVING for powerful data analysis.
Lesson 2: Aggregations — Summarizing Data 📊
Introduction
In Lesson 1, you learned to retrieve individual rows from a database. But what if you need to answer questions like "How many customers do we have?" or "What's the average order value?" or "Which product category generates the most revenue?" 🤔
This is where aggregate functions come in. They allow you to summarize data rather than just retrieve it. Instead of seeing every single row, you can calculate totals, averages, counts, and more.
Think of it like this: if your database is a detailed ledger 📖, aggregate functions are like the summary report at the bottom of each page.
Core Concepts
The Five Essential Aggregate Functions 🧮
SQL provides five fundamental aggregate functions that work across all major databases:
+----------+-----------------------------------+
| Function | What It Does |
+----------+-----------------------------------+
| COUNT() | Counts rows or non-null values |
| SUM() | Adds up numeric values |
| AVG() | Calculates the average (mean) |
| MIN() | Finds the smallest value |
| MAX() | Finds the largest value |
+----------+-----------------------------------+
COUNT() — Counting Rows 🔢
The most versatile aggregate function. It can count:
COUNT(*)— all rows (including rows with NULL values)COUNT(column_name)— only non-NULL values in that columnCOUNT(DISTINCT column_name)— only unique non-NULL values
-- How many orders total?
SELECT COUNT(*) FROM orders;
-- How many orders have a tracking number?
SELECT COUNT(tracking_number) FROM orders;
-- How many unique customers have ordered?
SELECT COUNT(DISTINCT customer_id) FROM orders;
💡 Tip: COUNT(*) is usually faster than COUNT(column_name) because the database doesn't need to check for NULL values.
SUM() — Adding Things Up ➕
Adds up all numeric values in a column. Ignores NULL values.
-- What's our total revenue?
SELECT SUM(order_total) FROM orders;
-- How many items have we sold in total?
SELECT SUM(quantity) FROM order_items;
⚠️ Important: SUM() only works with numeric columns. You can't sum text!
AVG() — Finding the Average 📈
Calculates the arithmetic mean. Also ignores NULL values.
-- What's the average order value?
SELECT AVG(order_total) FROM orders;
-- What's the average product price?
SELECT AVG(price) FROM products;
🧠 Remember: AVG() divides by the count of non-NULL values, not all rows.
MIN() and MAX() — Finding Extremes 🔺🔻
Find the smallest and largest values. Work with numbers, dates, and even text (alphabetically).
-- What was our cheapest and most expensive sale?
SELECT MIN(order_total), MAX(order_total) FROM orders;
-- When was our first and most recent order?
SELECT MIN(order_date), MAX(order_date) FROM orders;
-- Alphabetically first and last product names
SELECT MIN(product_name), MAX(product_name) FROM products;
GROUP BY — Breaking Data into Categories 📦
Aggregate functions alone give you one result for the entire table. But what if you want to see totals per category?
That's where GROUP BY comes in. It splits your data into groups and applies the aggregate function to each group separately.
Without GROUP BY: With GROUP BY:
[All Orders] [Orders by Category]
↓ ↓
Total: $50,000 Electronics: $30,000
Clothing: $15,000
Books: $5,000
The syntax:
SELECT column_to_group_by, AGGREGATE_FUNCTION(column_to_aggregate)
FROM table_name
GROUP BY column_to_group_by;
Important Rule 🎯: Every column in your SELECT that isn't inside an aggregate function must be in the GROUP BY clause.
-- Wrong! ❌
SELECT category, product_name, SUM(quantity)
FROM sales
GROUP BY category;
-- product_name isn't aggregated or grouped!
-- Correct! ✅
SELECT category, SUM(quantity)
FROM sales
GROUP BY category;
Multiple Columns in GROUP BY 🗂️
You can group by multiple columns to create subcategories:
-- Sales by category AND year
SELECT category, YEAR(sale_date), SUM(amount)
FROM sales
GROUP BY category, YEAR(sale_date);
This creates a group for each unique combination: (Electronics, 2023), (Electronics, 2024), (Clothing, 2023), etc.
WHERE vs. HAVING — Filtering Before and After 🚦
This is where many learners get confused. Let's make it crystal clear:
Query Execution Order:
1. FROM ← Choose your table
2. WHERE ← Filter individual ROWS
3. GROUP BY ← Group the remaining rows
4. HAVING ← Filter the GROUPS
5. SELECT ← Choose what to display
6. ORDER BY ← Sort the results
WHERE filters individual rows BEFORE grouping:
-- Only include orders from 2024, THEN calculate totals
SELECT category, SUM(amount)
FROM sales
WHERE YEAR(sale_date) = 2024
GROUP BY category;
HAVING filters groups AFTER aggregation:
-- Calculate totals for all categories, THEN only show categories with >$10,000
SELECT category, SUM(amount)
FROM sales
GROUP BY category
HAVING SUM(amount) > 10000;
🧠 Mnemonic: "Where the rows Have been grouped" — WHERE comes before, HAVING comes after.
Combining WHERE and HAVING 💪
You can use both in the same query:
-- From 2024 orders (WHERE), find categories with >$10,000 total (HAVING)
SELECT category, SUM(amount) as total
FROM sales
WHERE YEAR(sale_date) = 2024
GROUP BY category
HAVING SUM(amount) > 10000
ORDER BY total DESC;
Visualization of WHERE vs HAVING:
All Rows
↓
[WHERE filters rows]
↓
Filtered Rows
↓
[GROUP BY groups rows]
↓
Groups
↓
[HAVING filters groups]
↓
Final Groups
Column Aliases with Aggregates 🏷️
Aggregate results can have ugly default names. Use AS to give them meaningful names:
-- Instead of seeing "COUNT(*)" as the column header
SELECT category,
COUNT(*) AS order_count,
SUM(amount) AS total_revenue,
AVG(amount) AS average_order
FROM sales
GROUP BY category;
💡 Tip: You can reference aliases in ORDER BY, but not in WHERE or HAVING (use the full expression instead).
Detailed Examples 🔍
Let's work with a realistic orders table:
+----------+-------------+---------+------------+
| order_id | customer_id | amount | order_date |
+----------+-------------+---------+------------+
| 1 | 101 | 150.00 | 2024-01-15 |
| 2 | 102 | 200.00 | 2024-01-16 |
| 3 | 101 | 75.00 | 2024-01-17 |
| 4 | 103 | 300.00 | 2024-01-17 |
| 5 | 102 | 125.00 | 2024-01-18 |
| 6 | 101 | 225.00 | 2024-01-19 |
+----------+-------------+---------+------------+
Example 1: Basic Aggregation — Business Overview 📊
Question: "What's our total revenue, number of orders, and average order value?"
SELECT
COUNT(*) AS total_orders,
SUM(amount) AS total_revenue,
AVG(amount) AS average_order_value,
MIN(amount) AS smallest_order,
MAX(amount) AS largest_order
FROM orders;
Result:
+--------------+---------------+---------------------+----------------+---------------+
| total_orders | total_revenue | average_order_value | smallest_order | largest_order |
+--------------+---------------+---------------------+----------------+---------------+
| 6 | 1075.00 | 179.17 | 75.00 | 300.00 |
+--------------+---------------+---------------------+----------------+---------------+
Explanation: This single query gives us five key business metrics. Notice how the aggregate functions reduce six rows down to one summary row. This is perfect for dashboard statistics! 📈
Example 2: GROUP BY — Per-Customer Analysis 👥
Question: "How much has each customer spent, and how many orders have they placed?"
SELECT
customer_id,
COUNT(*) AS order_count,
SUM(amount) AS total_spent,
AVG(amount) AS avg_per_order
FROM orders
GROUP BY customer_id
ORDER BY total_spent DESC;
Result:
+-------------+-------------+-------------+---------------+
| customer_id | order_count | total_spent | avg_per_order |
+-------------+-------------+-------------+---------------+
| 101 | 3 | 450.00 | 150.00 |
| 102 | 2 | 325.00 | 162.50 |
| 103 | 1 | 300.00 | 300.00 |
+-------------+-------------+-------------+---------------+
Explanation: GROUP BY customer_id creates three groups (one per customer). The aggregate functions calculate separately for each group. Customer 101 appears in three rows of the original data, but only once here with their totals. This is how you identify your best customers! 🌟
Example 3: HAVING — Finding High-Value Customers 💎
Question: "Which customers have spent more than $300 total?"
SELECT
customer_id,
COUNT(*) AS order_count,
SUM(amount) AS total_spent
FROM orders
GROUP BY customer_id
HAVING SUM(amount) > 300
ORDER BY total_spent DESC;
Result:
+-------------+-------------+-------------+
| customer_id | order_count | total_spent |
+-------------+-------------+-------------+
| 101 | 3 | 450.00 |
| 102 | 2 | 325.00 |
+-------------+-------------+-------------+
Explanation: First, we group by customer and calculate totals. Then HAVING filters out any groups where the total is $300 or less. Customer 103 is excluded even though their single order was $300 (not greater than $300). Notice we can't use WHERE here because we're filtering based on the SUM, which doesn't exist until after grouping! ⚠️
Example 4: WHERE + GROUP BY + HAVING — Complex Analysis 🎯
Question: "For orders placed on or after January 17, which customers have placed more than one order?"
SELECT
customer_id,
COUNT(*) AS order_count,
SUM(amount) AS total_spent
FROM orders
WHERE order_date >= '2024-01-17'
GROUP BY customer_id
HAVING COUNT(*) > 1
ORDER BY order_count DESC;
Result:
+-------------+-------------+-------------+
| customer_id | order_count | total_spent |
+-------------+-------------+-------------+
| 101 | 2 | 300.00 |
+-------------+-------------+-------------+
Explanation: Let's trace the execution:
- WHERE filters to orders on/after Jan 17 (keeps orders 3, 4, 5, 6)
- GROUP BY groups those four orders by customer (101 has 2, 102 has 1, 103 has 1)
- HAVING filters to groups with >1 order (only customer 101 remains)
- SELECT displays the results
This demonstrates the power of combining all three clauses! 🚀
Common Mistakes ⚠️
Mistake 1: Forgetting GROUP BY for Non-Aggregated Columns ❌
-- WRONG!
SELECT customer_id, COUNT(*)
FROM orders;
Error: "customer_id must appear in GROUP BY or be used in an aggregate function"
Fix: Add GROUP BY:
SELECT customer_id, COUNT(*)
FROM orders
GROUP BY customer_id;
Mistake 2: Using WHERE Instead of HAVING ❌
-- WRONG!
SELECT customer_id, SUM(amount) AS total
FROM orders
WHERE SUM(amount) > 300 -- Can't use aggregates in WHERE!
GROUP BY customer_id;
Fix: Use HAVING:
SELECT customer_id, SUM(amount) AS total
FROM orders
GROUP BY customer_id
HAVING SUM(amount) > 300;
Mistake 3: Mixing Aggregated and Non-Aggregated Columns ❌
-- WRONG!
SELECT customer_id, amount, SUM(amount)
FROM orders
GROUP BY customer_id;
Problem: Which specific amount should be shown when there are multiple orders per customer?
Fix: Either aggregate it (AVG(amount)) or group by it, or remove it.
Mistake 4: NULL Confusion 🤔
-- If a column has NULL values:
SELECT COUNT(*), COUNT(tracking_number)
FROM orders;
These can return different numbers! COUNT(*) counts all rows, but COUNT(tracking_number) only counts rows where tracking_number is NOT NULL.
Mistake 5: Aggregate of Aggregate ❌
-- WRONG!
SELECT AVG(COUNT(*))
FROM orders
GROUP BY customer_id;
You can't nest aggregate functions directly. You'd need a subquery (covered in a future lesson).
Real-World Use Cases 🌍
E-commerce Analytics 🛒:
- "Which products have sold more than 100 units?"
- "What's the average order value by customer region?"
- "Which categories generate >$50,000 in revenue?"
User Behavior Analysis 📱:
- "How many active users do we have per day?"
- "Which features are used by >1,000 users?"
- "What's the average session duration by device type?"
Financial Reporting 💰:
- "What's our monthly revenue?"
- "Which expense categories exceed our budget?"
- "What's the average transaction size by payment method?"
Inventory Management 📦:
- "Which warehouses have more than 10,000 items in stock?"
- "What's the average inventory value by product category?"
- "Which suppliers have delivered more than 50 orders?"
🔧 Try This: Practice Exercise
Imagine you have this sales table:
+----------+----------+----------+----------+
| sale_id | product | quantity | region |
+----------+----------+----------+----------+
| 1 | Widget | 5 | North |
| 2 | Gadget | 3 | South |
| 3 | Widget | 8 | North |
| 4 | Widget | 2 | East |
| 5 | Gadget | 10 | South |
+----------+----------+----------+----------+
Challenge: Write a query to find the total quantity sold per product, but only show products where the total quantity is greater than 10.
💡 Solution
SELECT product, SUM(quantity) AS total_quantity
FROM sales
GROUP BY product
HAVING SUM(quantity) > 10
ORDER BY total_quantity DESC;
Result would show only "Widget" with total_quantity of 15 (5+8+2).
🤔 Did You Know?
The term "aggregate" comes from Latin aggregare meaning "to add to a flock" — literally bringing individual items together into a group! 🐑🐑🐑
Some databases have additional aggregate functions like STDDEV() (standard deviation), VARIANCE(), MEDIAN(), and even STRING_AGG() to concatenate text values! These five core functions work universally, but check your database documentation for extra tools.
Key Takeaways 🎯
- Five core aggregate functions: COUNT, SUM, AVG, MIN, MAX — they summarize data
- GROUP BY splits data into categories and aggregates each category separately
- WHERE filters rows BEFORE aggregation; HAVING filters groups AFTER aggregation
- Every non-aggregated column in SELECT must appear in GROUP BY
- Use COUNT(*) for all rows, COUNT(column) to exclude NULLs
- You can combine WHERE, GROUP BY, and HAVING in one query for powerful analysis
- Aliases make your results readable:
SUM(amount) AS total_revenue
📚 Further Study
PostgreSQL Aggregate Functions Documentation: https://www.postgresql.org/docs/current/functions-aggregate.html — Comprehensive guide to all aggregate functions including advanced ones
SQL GROUP BY Tutorial (Mode Analytics): https://mode.com/sql-tutorial/sql-group-by/ — Interactive examples with real datasets
W3Schools SQL HAVING: https://www.w3schools.com/sql/sql_having.asp — Simple examples with try-it-yourself editor
📋 Quick Reference Card
+------------------+----------------------------------------+
| Aggregate Func | Purpose |
+------------------+----------------------------------------+
| COUNT(*) | Count all rows |
| COUNT(column) | Count non-NULL values |
| COUNT(DISTINCT) | Count unique non-NULL values |
| SUM(column) | Add up all values |
| AVG(column) | Calculate average |
| MIN(column) | Find smallest value |
| MAX(column) | Find largest value |
+------------------+----------------------------------------+
BASIC PATTERN:
SELECT column, AGGREGATE(column)
FROM table
GROUP BY column;
FILTERING PATTERN:
SELECT column, AGGREGATE(column) AS alias
FROM table
WHERE condition_on_rows
GROUP BY column
HAVING condition_on_groups
ORDER BY alias;
KEY RULE: WHERE filters rows → GROUP BY groups them → HAVING filters groups
REMEMBER: Every SELECT column must be:
1. In an aggregate function, OR
2. In the GROUP BY clause