Python · pandas

pandas groupby that finally clicks: split, apply, combine

Most pandas confusion disappears once you see groupby as three steps, not one magic call. Here's the mental model — agg vs transform vs filter — with runnable examples.

BhanuFounderJune 12, 2026
3 min read

If pandas groupby still feels like guesswork, it's almost always because it's taught as one operation. It's three: split the rows into groups, apply something to each group, then combine the results back. Hold that single picture and the whole API falls into place.

The fast version:

groupby(...) only describes the split — nothing computes until you apply something.
Pick the right apply step — agg, transform, or filter — and the answer is one line.
The three differ by the shape they return, which is the thing that trips people up.

Split, apply, combine

python

import pandas as pd

df = pd.DataFrame({
    "team": ["A", "A", "B", "B", "B"],
    "points": [10, 14, 7, 9, 21],
})

# split by team → apply mean to points → combine into one row per team
df.groupby("team")["points"].mean()

groupby("team") doesn't compute anything yet — it just describes the split. The work happens when you apply something to each group. That laziness is the part people miss, and it's why chaining the wrong method feels unpredictable.

The three things you can apply

The whole API comes down to choosing what to apply, and each choice returns a different shape:

You want	Method	Result shape
One value per group	`.agg()`	one row per group
A value per original row	`.transform()`	same shape as input
To keep or drop whole groups	`.filter()`	subset of input rows

transform is the quiet hero. Need each row's value as a share of its group total? That's a transform, not an aggregate — because you want the answer broadcast back onto every original row:

python

# group total broadcast back onto every row
df["share"] = df["points"] / df.groupby("team")["points"].transform("sum")

If you reached for agg here, you'd get one number per team and then have to merge it back — fighting the shape instead of using it.

Why the shape is the whole game

Most "why doesn't this work?" pandas moments are shape mismatches: you produced one row per group when you needed one per original row, or vice versa. Naming the three apply steps by their output shape — collapse (agg), preserve (transform), subset (filter) — turns a guessing game into a decision. Ask "what shape do I need back?" first, and the method picks itself.

How this shows up on CodeOak

CodeOak's pandas track is table-native: you're handed a real DataFrame and graded on the table you produce — by deterministic comparison against the expected output, exactly like the SQL track. No AI judges your code; the result table either matches or it doesn't. That means the split-apply-combine instinct is precisely what's being tested. Pick the right apply step and you're done in a line; pick the wrong one and you're wrestling the shape of the result.

Practice a few agg vs transform problems back to back. The moment you stop reaching for a loop, you've got it. Window functions are the SQL version of the same instinct — see window functions explained — and if you want a system that drills the exact pattern you keep missing, start with the assessment.

FAQ

What does split-apply-combine actually mean in pandas? It's the three stages of every groupby: split the rows into groups by a key, apply a function to each group, then combine the results into a new Series or DataFrame. groupby() only sets up the split; the apply step does the work.

When should I use transform instead of agg in pandas? Use transform when you need a result for every original row — like each value as a share of its group total — because it returns the same shape as the input. Use agg when you want one summarized row per group.

Does groupby compute anything on its own? No. df.groupby("team") is lazy — it only describes how rows are split. Nothing is calculated until you chain an apply step such as .mean(), .agg(), .transform(), or .filter().

#Python #pandas

Written by

Bhanu

Founder of CodeOak. Building the system that turns 'I don't know what to practice' into a roadmap that decides for you. Panda profile — steady, methodical, table-first.

The Grove · all articles

Python · pandas

pandas groupby that finally clicks: split, apply, combine

Most pandas confusion disappears once you see groupby as three steps, not one magic call. Here's the mental model — agg vs transform vs filter — with runnable examples.

BhanuFounderJune 12, 2026
3 min read

The fast version:

groupby(...) only describes the split — nothing computes until you apply something.
Pick the right apply step — agg, transform, or filter — and the answer is one line.
The three differ by the shape they return, which is the thing that trips people up.

Split, apply, combine

python

import pandas as pd

df = pd.DataFrame({
    "team": ["A", "A", "B", "B", "B"],
    "points": [10, 14, 7, 9, 21],
})

# split by team → apply mean to points → combine into one row per team
df.groupby("team")["points"].mean()

The three things you can apply

The whole API comes down to choosing what to apply, and each choice returns a different shape:

You want	Method	Result shape
One value per group	`.agg()`	one row per group
A value per original row	`.transform()`	same shape as input
To keep or drop whole groups	`.filter()`	subset of input rows

transform is the quiet hero. Need each row's value as a share of its group total? That's a transform, not an aggregate — because you want the answer broadcast back onto every original row:

python

# group total broadcast back onto every row
df["share"] = df["points"] / df.groupby("team")["points"].transform("sum")

If you reached for agg here, you'd get one number per team and then have to merge it back — fighting the shape instead of using it.

Why the shape is the whole game

How this shows up on CodeOak

FAQ

#Python #pandas

Written by

Bhanu

Founder of CodeOak. Building the system that turns 'I don't know what to practice' into a roadmap that decides for you. Panda profile — steady, methodical, table-first.

Split, apply, combine

The three things you can apply

Why the shape is the whole game

How this shows up on CodeOak

FAQ

Keep reading

Python comprehensions that read well (and when to stop)

Window functions, finally explained without the jargon

What we shipped in May: signals over vanity metrics

Split, apply, combine

The three things you can apply

Why the shape is the whole game

How this shows up on CodeOak

FAQ

Keep reading

Python comprehensions that read well (and when to stop)

Window functions, finally explained without the jargon

What we shipped in May: signals over vanity metrics