Python pandas GroupBy for a Student Dashboard

pandas becomes useful when students stop treating a DataFrame as a mystery object and start seeing it as a table they can filter, group, and summarize.

Scenario: study session log

A fictional tutoring dashboard stores course, minutes, topic, and whether the student completed the practice set.

import pandas as pd

sessions = pd.DataFrame([
    {"course": "AP CSA", "topic": "Arrays", "minutes": 35, "done": True},
    {"course": "AP CSA", "topic": "FRQ", "minutes": 50, "done": False},
    {"course": "Python", "topic": "pandas", "minutes": 40, "done": True},
    {"course": "USACO", "topic": "Simulation", "minutes": 60, "done": True},
])

summary = sessions.groupby("course").agg(
    total_minutes=("minutes", "sum"),
    completed=("done", "sum"),
    sessions=("topic", "count")
)
summary["avg_minutes"] = summary["total_minutes"] / summary["sessions"]

Why this is stronger than a loop

A loop could compute the same totals, but groupby expresses the data question directly: summarize by course. This is the kind of thinking students need for real data projects.

Common mistakes

Grouping before cleaning inconsistent course names.
Taking an average without checking missing or zero values.
Confusing row filtering with column selection.
Making a chart before deciding what question it answers.

Practice extension

Filter to sessions longer than 30 minutes, group by topic, and make a bar chart of total minutes. Then write one sentence that does not overclaim.