Lesson 4: Subqueries — Queries Inside Queries

Master nested queries, correlated subqueries, EXISTS, IN, and learn when to use subqueries versus joins for complex data retrieval.

Lesson 4: Subqueries — Queries Inside Queries 🔍

You've learned to SELECT data, aggregate it, and JOIN tables together. But what happens when you need the result of one query to inform another? What if you need to find "all employees who earn more than the average salary" or "customers who have never placed an order"? Welcome to the world of subqueries — queries nested inside other queries.

What Are Subqueries? 🎯

A subquery (also called a nested query or inner query) is a SELECT statement embedded within another SQL statement. The outer query uses the subquery's result to complete its own operation.

Think of subqueries like Russian nesting dolls 🪆 — one query lives inside another. The inner query executes first, and its result feeds the outer query.

+----------------------------------+
|  OUTER QUERY                     |
|  SELECT * FROM employees         |
|  WHERE salary > (                |
|    +-------------------------+   |
|    | INNER QUERY (subquery)  |   |
|    | SELECT AVG(salary)      |   |
|    | FROM employees          |   |
|    +-------------------------+   |
|  )                               |
+----------------------------------+

Where Can Subqueries Live? 📍

Subqueries can appear in three main locations:

1. Subqueries in WHERE Clause 🎪

The most common placement — filtering rows based on another query's result:

SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

This finds employees earning above the average salary.

2. Subqueries in FROM Clause 📦

Treating a subquery result as a temporary table (called a derived table or inline view):

SELECT dept_name, avg_salary
FROM (
  SELECT department_id, AVG(salary) as avg_salary
  FROM employees
  GROUP BY department_id
) AS dept_averages
JOIN departments ON departments.id = dept_averages.department_id;

💡 Tip: Always alias subqueries in the FROM clause — most databases require it!

3. Subqueries in SELECT Clause 🎨

Returning a calculated value for each row:

SELECT 
  name,
  salary,
  (SELECT AVG(salary) FROM employees) AS company_avg
FROM employees;

This shows each employee's salary alongside the company average.

Types of Subqueries: Correlated vs Non-Correlated 🔗

Non-Correlated Subqueries (Independent) 🆓

These execute once and return a result that the outer query uses. They're independent — they don't reference the outer query.

SELECT name
FROM products
WHERE price > (SELECT AVG(price) FROM products);

Execution flow:

Step 1: SELECT AVG(price) FROM products → returns 50
Step 2: SELECT name FROM products WHERE price > 50

Correlated Subqueries (Dependent) 🔗

These reference columns from the outer query and execute once for each row processed by the outer query. They're dependent.

SELECT e1.name, e1.salary
FROM employees e1
WHERE e1.salary > (
  SELECT AVG(e2.salary)
  FROM employees e2
  WHERE e2.department_id = e1.department_id
);

This finds employees earning above their department's average (not the company average).

Execution flow:

For each employee row:
  → Calculate AVG salary for THAT employee's department
  → Compare employee's salary to department average
  → Include row if salary > department average

⚠️ Performance note: Correlated subqueries can be slower because they run multiple times. Sometimes a JOIN is more efficient.

Key Operators with Subqueries 🔑

IN Operator 📥

Tests if a value exists in a subquery's result set:

SELECT name
FROM customers
WHERE customer_id IN (
  SELECT DISTINCT customer_id
  FROM orders
  WHERE order_date >= '2024-01-01'
);

Finds customers who placed orders in 2024.

EXISTS Operator ✅

Tests whether a subquery returns any rows (returns TRUE/FALSE):

SELECT name
FROM customers c
WHERE EXISTS (
  SELECT 1
  FROM orders o
  WHERE o.customer_id = c.customer_id
  AND o.order_date >= '2024-01-01'
);

Same result as IN, but often more efficient for correlated subqueries.

💡 Tip: With EXISTS, the SELECT list doesn't matter — use SELECT 1 or SELECT *. The database only checks if rows exist.

NOT IN and NOT EXISTS 🚫

Find rows that don't match:

-- Customers who have NEVER placed an order
SELECT name
FROM customers c
WHERE NOT EXISTS (
  SELECT 1
  FROM orders o
  WHERE o.customer_id = c.customer_id
);

⚠️ Beware: NOT IN with NULL values can cause unexpected results! If the subquery returns any NULL, NOT IN returns no rows. NOT EXISTS handles NULLs more predictably.

Comparison Operators with ALL/ANY 🎲

ANY: True if comparison holds for at least one subquery result
ALL: True if comparison holds for every subquery result

-- Products more expensive than ANY product in category 5
SELECT name, price
FROM products
WHERE price > ANY (
  SELECT price FROM products WHERE category_id = 5
);

-- Products more expensive than ALL products in category 5
SELECT name, price
FROM products
WHERE price > ALL (
  SELECT price FROM products WHERE category_id = 5
);

Detailed Examples 📚

Example 1: Finding Above-Average Performers 🌟

Scenario: You want to identify high-performing employees — those earning more than the average salary.

SELECT 
  employee_id,
  name,
  salary,
  (SELECT AVG(salary) FROM employees) AS company_average
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees)
ORDER BY salary DESC;

Breaking it down:

Subquery in WHERE: (SELECT AVG(salary) FROM employees) calculates the average (e.g., 65000)
Subquery in SELECT: Adds the average as a column for context
Outer query: Returns employees where salary > 65000

Sample result:

+-------------+-------------+--------+------------------+
| employee_id | name        | salary | company_average  |
+-------------+-------------+--------+------------------+
| 5           | Sarah Chen  | 95000  | 65000            |
| 12          | James Smith | 87000  | 65000            |
| 8           | Maria Lopez | 72000  | 65000            |
+-------------+-------------+--------+------------------+

Example 2: Multi-Level Aggregation with Derived Tables 📊

Scenario: Find departments where the average salary exceeds $70,000.

You can't use WHERE AVG(salary) > 70000 directly because WHERE filters before aggregation. You need HAVING... or a subquery!

SELECT 
  d.department_name,
  dept_stats.avg_salary,
  dept_stats.employee_count
FROM departments d
JOIN (
  SELECT 
    department_id,
    AVG(salary) AS avg_salary,
    COUNT(*) AS employee_count
  FROM employees
  GROUP BY department_id
) AS dept_stats ON d.department_id = dept_stats.department_id
WHERE dept_stats.avg_salary > 70000
ORDER BY dept_stats.avg_salary DESC;

What's happening:

Inner query: Groups employees by department, calculates average salary and count
Outer query: Joins this derived table with departments table
WHERE clause: Filters to high-paying departments

🧠 Mnemonic: Think "FROM subquery" as creating a temporary table you can JOIN and filter like any normal table.

Example 3: Correlated Subquery for Ranked Results 🏆

Scenario: For each department, find employees earning more than their department's average.

SELECT 
  e.name,
  e.department_id,
  e.salary,
  (
    SELECT AVG(salary)
    FROM employees e2
    WHERE e2.department_id = e.department_id
  ) AS dept_avg
FROM employees e
WHERE e.salary > (
  SELECT AVG(salary)
  FROM employees e3
  WHERE e3.department_id = e.department_id
);

Execution visualization:

For employee 'Alice' (dept_id=1, salary=80000):
  → Calculate AVG(salary) WHERE dept_id=1 → 65000
  → Is 80000 > 65000? YES → Include row

For employee 'Bob' (dept_id=1, salary=55000):
  → Calculate AVG(salary) WHERE dept_id=1 → 65000
  → Is 55000 > 65000? NO → Exclude row

For employee 'Carol' (dept_id=2, salary=72000):
  → Calculate AVG(salary) WHERE dept_id=2 → 68000
  → Is 72000 > 68000? YES → Include row

Notice the subquery runs for each employee, using that employee's department_id.

Example 4: EXISTS for Relationship Checking 🔍

Scenario: Find products that have never been ordered (orphaned inventory).

SELECT 
  product_id,
  product_name,
  price
FROM products p
WHERE NOT EXISTS (
  SELECT 1
  FROM order_items oi
  WHERE oi.product_id = p.product_id
);

Why EXISTS here?

We only care if any order exists, not the actual data
EXISTS stops searching once it finds one match (efficient!)
Handles NULL values better than NOT IN

Alternative with LEFT JOIN:

SELECT 
  p.product_id,
  p.product_name,
  p.price
FROM products p
LEFT JOIN order_items oi ON p.product_id = oi.product_id
WHERE oi.product_id IS NULL;

Both work! The JOIN might be faster on some databases, EXISTS more readable.

Subqueries vs JOINs: When to Use Which? 🤔

+----------------------+-------------------------+
|   Use SUBQUERY when: | Use JOIN when:          |
+----------------------+-------------------------+
| • You need only      | • You need columns from |
|   existence check    |   multiple tables       |
|   (EXISTS)           |                         |
| • Single aggregated  | • Better performance    |
|   value needed       |   for large datasets    |
| • Query logic is     | • Relating equal        |
|   clearer nested     |   importance entities   |
| • Working with       | • Database optimizer    |
|   correlated data    |   handles JOINs well    |
+----------------------+-------------------------+

🔧 Try this: Take any subquery with IN or EXISTS and rewrite it as a JOIN. Compare readability and (if you have a large dataset) performance!

Common Mistakes to Avoid ⚠️

1. Forgetting to Alias Subqueries in FROM 🏷️

-- ❌ WRONG: No alias
SELECT * FROM (SELECT * FROM employees);

-- ✅ CORRECT:
SELECT * FROM (SELECT * FROM employees) AS emp;

2. Subquery Returning Multiple Rows with = Operator 📏

-- ❌ WRONG: Subquery returns multiple values
SELECT name FROM employees
WHERE department_id = (SELECT department_id FROM departments);

-- ✅ CORRECT: Use IN for multiple values
SELECT name FROM employees
WHERE department_id IN (SELECT department_id FROM departments);

3. NOT IN with NULL Values 💀

-- ⚠️ DANGER: If subquery returns NULL, this returns NO rows!
SELECT * FROM customers
WHERE customer_id NOT IN (SELECT customer_id FROM orders);

-- ✅ SAFER: Use NOT EXISTS
SELECT * FROM customers c
WHERE NOT EXISTS (
  SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id
);

4. Unnecessary Correlated Subqueries 🐌

-- ❌ SLOW: Correlated subquery runs for each row
SELECT e.name,
  (SELECT d.name FROM departments d WHERE d.id = e.department_id) AS dept
FROM employees e;

-- ✅ FASTER: Simple JOIN
SELECT e.name, d.name AS dept
FROM employees e
JOIN departments d ON d.id = e.department_id;

5. Confusing ANY and ALL 🎯

-- Returns products cheaper than the MOST EXPENSIVE in category 1
WHERE price < ANY (SELECT price FROM products WHERE category_id = 1);

-- Returns products cheaper than the CHEAPEST in category 1
WHERE price < ALL (SELECT price FROM products WHERE category_id = 1);

🧠 Mnemonic: ANY = "at least one" / ALL = "every single one"

Real-World Use Cases 🌍

E-commerce: "Find customers who spent more than the average order value in the last 30 days"

HR Systems: "List employees whose salary is in the top 10% of their department"

Analytics: "Show products that sell better than category average"

Data Cleaning: "Identify duplicate records by finding rows where another row exists with the same email but different ID"

🤔 Did You Know?

Some SQL dialects support scalar subqueries that return a single value and can be used anywhere an expression is valid — even in SET clauses of UPDATE statements:

UPDATE employees
SET salary = salary * 1.1
WHERE salary < (SELECT AVG(salary) FROM employees);

This gives a 10% raise to below-average earners!

Key Takeaways 🎯

✅ Subqueries are SELECT statements nested inside other SQL statements

✅ They can appear in WHERE, FROM, or SELECT clauses

✅ Non-correlated subqueries execute once; correlated execute per outer row

✅ IN checks membership; EXISTS checks existence (often faster)

✅ NOT EXISTS handles NULLs better than NOT IN

✅ Use ANY for "at least one", ALL for "every single one"

✅ Subqueries in FROM must be aliased

✅ Sometimes JOINs are faster than subqueries — test with your data!

✅ Correlated subqueries reference the outer query's columns

📚 Further Study

PostgreSQL Subquery Documentation: https://www.postgresql.org/docs/current/functions-subquery.html — Comprehensive guide to subquery expressions
Use The Index, Luke (Subqueries): https://use-the-index-luke.com/sql/where-clause/functions/subqueries — Performance implications of subqueries
SQL Server Subquery Fundamentals: https://learn.microsoft.com/en-us/sql/relational-databases/performance/subqueries — Microsoft's guide with optimization tips

📋 Quick Reference Card

+================================ SUBQUERIES CHEAT SHEET ================================+
|
| LOCATIONS:
|   WHERE:  ... WHERE col > (SELECT AVG(col) FROM table)
|   FROM:   ... FROM (SELECT ... FROM table) AS alias
|   SELECT: SELECT col, (SELECT COUNT(*) FROM other) AS cnt FROM table
|
| TYPES:
|   Non-Correlated: Independent, runs once
|   Correlated:     References outer query, runs per row
|
| KEY OPERATORS:
|   IN:          WHERE id IN (SELECT id FROM ...)
|   EXISTS:      WHERE EXISTS (SELECT 1 FROM ... WHERE ...)
|   NOT IN:      WHERE id NOT IN (SELECT id FROM ...) ⚠️ NULL danger!
|   NOT EXISTS:  WHERE NOT EXISTS (SELECT 1 FROM ...)
|   ANY:         WHERE price > ANY (SELECT price FROM ...) → at least one
|   ALL:         WHERE price > ALL (SELECT price FROM ...) → every one
|
| COMMON PATTERNS:
|   Above average:     WHERE col > (SELECT AVG(col) FROM table)
|   Department avg:    WHERE col > (SELECT AVG(col) FROM t WHERE t.dept = outer.dept)
|   Never ordered:     WHERE NOT EXISTS (SELECT 1 FROM orders WHERE ...)
|   Derived table:     FROM (SELECT dept, AVG(sal) FROM emp GROUP BY dept) AS x
|
| PERFORMANCE TIPS:
|   • EXISTS often faster than IN for correlated checks
|   • Consider JOINs for columns from multiple tables
|   • Minimize correlated subqueries when possible
|   • Always alias subqueries in FROM clause
|
+========================================================================================+

📝

Ready to practice?

This lesson has 45 questions to help you learn