pandas becomes useful when students stop treating a DataFrame as a mystery object and start seeing it as a table they can filter, group, and summarize.
Scenario: study session log
A fictional tutoring dashboard stores course, minutes, topic, and whether the student completed the practice set.
import pandas as pd
sessions = pd.DataFrame([
{"course": "AP CSA", "topic": "Arrays", "minutes": 35, "done": True},
{"course": "AP CSA", "topic": "FRQ", "minutes": 50, "done": False},
{"course": "Python", "topic": "pandas", "minutes": 40, "done": True},
{"course": "USACO", "topic": "Simulation", "minutes": 60, "done": True},
])
summary = sessions.groupby("course").agg(
total_minutes=("minutes", "sum"),
completed=("done", "sum"),
sessions=("topic", "count")
)
summary["avg_minutes"] = summary["total_minutes"] / summary["sessions"]
Why this is stronger than a loop
A loop could compute the same totals, but groupby expresses the data question directly: summarize by course. This is the kind of thinking students need for real data projects.
Common mistakes
- Grouping before cleaning inconsistent course names.
- Taking an average without checking missing or zero values.
- Confusing row filtering with column selection.
- Making a chart before deciding what question it answers.
Practice extension
Filter to sessions longer than 30 minutes, group by topic, and make a bar chart of total minutes. Then write one sentence that does not overclaim.
