Reading Aggregations
GROUP BY, HAVING, COUNT/SUM/AVG/MIN/MAX — describing what aggregate queries compute
Aggregation vocabulary
- GROUP BY col — one result row per unique value; use with COUNT/SUM/AVG/MIN/MAX
- WHERE — filters rows before grouping; HAVING — filters groups after aggregation
- COUNT(*) — counts all rows; COUNT(col) — counts non-null values only
- SQL clause order: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT
- COUNT(DISTINCT col) — count unique non-null values in a column
Question 0 of 5
Describe what this query returns:SELECT status, COUNT(*) AS order_count FROM orders GROUP BY status;
- GROUP BY col — collapses rows that share the same value in col into a single result row
- COUNT(*) — counts all rows in each group
- COUNT(col) — counts non-null values in col
- SUM(col) — total of col values in the group
- AVG(col) — average of col values
- MIN(col) / MAX(col) — smallest / largest value
pending: 42, shipped: 18, delivered: 95.What is the difference between WHERE and HAVING in this query?SELECT user_id, COUNT(*) AS order_count FROM orders WHERE created_at > '2024-01-01' GROUP BY user_id HAVING COUNT(*) > 5;
- WHERE created_at > '2024-01-01' — runs first, removes rows where created_at is in 2023 or earlier; aggregate functions cannot be used here
- GROUP BY user_id — groups the remaining rows by user_id
- HAVING COUNT(*) > 5 — runs after grouping; only keeps groups where the count exceeds 5
WHERE COUNT(*) > 5 (aggregate in WHERE is a SQL error); use HAVING instead.What does this query calculate?SELECT department, AVG(salary) AS avg_salary, MAX(salary) AS top_salary, MIN(salary) AS lowest_salary FROM employees GROUP BY department ORDER BY avg_salary DESC;
- "aggregates salary by department" — the GROUP BY clause
- "computes descriptive statistics" — AVG, MAX, MIN together
- "ranks departments by average compensation" — ORDER BY avg_salary DESC
An analyst asks: "How do I count orders per user, but only show users who placed at least 3 orders in the last 30 days?" Which query is correct?
- FROM — which table
- WHERE — row-level filter (before grouping)
- GROUP BY — group the filtered rows
- HAVING — group-level filter (after grouping)
- SELECT — compute output columns
- ORDER BY — sort the result
- LIMIT — cap the number of rows
What does COUNT(*) vs COUNT(column_name) return differently?
COUNT(*)— total number of rows in the group, including rows where any column is NULLCOUNT(phone)— number of rows where phone is not NULL; useful to measure field completenessCOUNT(DISTINCT col)— number of unique non-null values
COUNT(*) = 1000, COUNT(phone) = 800, COUNT(DISTINCT phone) = 800 (or fewer if some users share a phone).