CodeOak tree logoCodeOak
DataAI Study BuddyAboutBlog
Join Now
Join Now

CodeOak AI · Strengthen Your Coding Roots.

TermsPrivacyCookiesSupport
← The Grove · all articles
#SQL

Window functions, finally explained without the jargon

GROUP BY collapses rows. Window functions keep them. Once that one distinction clicks, ranking, running totals, and 'top N per group' get easy.

BhanuFounderJune 10, 2026
3 min read
SQL

A window function runs an aggregate like GROUP BY does, but keeps every row instead of collapsing them — attaching the result alongside each row. People find window functions intimidating because of one misconception: that they're a fancier GROUP BY. They're not. The difference is simple, and once it lands the rest is vocabulary.

The one thing to remember: GROUP BY collapses rows into one per group; a window function keeps all rows and adds the computed value to each.

GROUP BY collapses. Windows don't.

GROUP BY takes many rows and returns one per group. A window function runs the same kind of aggregate but keeps every row and attaches the result:

sql
-- GROUP BY: one row per department.
SELECT department, AVG(salary)
FROM employees
GROUP BY department;

-- Window: every employee row, plus their department average.
SELECT
  name,
  department,
  salary,
  AVG(salary) OVER (PARTITION BY department) AS dept_avg
FROM employees;

The OVER (...) clause is what makes it a window. PARTITION BY is the window's version of grouping — it defines which rows each calculation sees — but the original rows survive.

The three you'll use constantly

Most real work uses just three window functions:

  • ROW_NUMBER() — a unique sequential number per partition.
  • RANK() — like row number, but ties share a rank and leave gaps.
  • SUM(...) OVER (ORDER BY ...) — a running total.

The difference between ROW_NUMBER() and RANK() is the classic gotcha: with two people tied for first, ROW_NUMBER() still assigns 1 and 2, while RANK() assigns 1 and 1, then skips to 3.

"Top N per group" — the classic

This is the question window functions were born for: the highest-paid employee in each department.

sql
SELECT name, department, salary
FROM (
  SELECT
    name,
    department,
    salary,
    ROW_NUMBER() OVER (
      PARTITION BY department
      ORDER BY salary DESC
    ) AS rn
  FROM employees
) ranked
WHERE rn = 1;

Partition by the group, order by what you're ranking, then filter to rn = 1. Swap = 1 for <= 3 and you've got top 3 per department — the same pattern, one character changed.

Why this is worth drilling

Once the collapse-vs-keep distinction lands, ranking, running totals, gaps-and-islands, and period-over-period comparisons all open up — they're variations on the same OVER (...) idea. It's one of the highest-leverage SQL concepts for data and analytics interviews because so many real questions reduce to "rank within a group" or "running total over an order."

The pandas equivalent of "keep every row but compute per group" is transform — see pandas groupby that clicks. And the ranked subquery above is exactly the kind of step that reads better as a CTE; CTEs vs subqueries covers when to make that swap.

How this shows up on CodeOak

CodeOak's SQL query track is output-graded: you're scored on the table your query returns, compared deterministically against the expected result — no AI judging your code. Window functions are a recurring focus because they map so cleanly to that format: get the partition and order right and the table is correct. Take the assessment and the roadmap will surface them when they're your gap.

FAQ

What is the difference between a window function and GROUP BY? GROUP BY collapses many rows into one per group. A window function performs a similar calculation but keeps every original row, attaching the computed value to each via an OVER (...) clause. Use a window when you need per-row detail and the group-level value together.

What does PARTITION BY do in a window function? PARTITION BY defines the set of rows each window calculation considers — it's the window's equivalent of grouping. Unlike GROUP BY, it doesn't reduce the row count; it just scopes the calculation to rows that share the partition value.

How do I get the top N rows per group in SQL? Add ROW_NUMBER() OVER (PARTITION BY group ORDER BY metric DESC) in a subquery or CTE, then filter the outer query to rn <= N. Use = 1 for the single top row per group, or <= 3 for the top three.

#SQL
Share
Written by
Bhanu

Founder of CodeOak. Building the system that turns 'I don't know what to practice' into a roadmap that decides for you. Panda profile — steady, methodical, table-first.

Keep reading

All articles →
SQL#SQL

CTEs vs subqueries: when to reach for which

Bhanu · 3 min
PY#Python · pandas

pandas groupby that finally clicks: split, apply, combine

Bhanu · 3 min
PY#Python

Python comprehensions that read well (and when to stop)

Bhanu · 3 min