Understanding Box Plots
In this module, we’re learning about box plots, also called box-and-whisker plots. These powerful visualizations help us understand how numerical data is spread out and where most of our values lie.
A box plot summarizes data using five key numbers:
- Minimum value (bottom whisker)
- First quartile (bottom of box)
- Median (middle line in box)
- Third quartile (top of box)
- Maximum value (top whisker)
Think of it like dividing your data into four equal parts, then drawing a box with “whiskers” to show its spread.
Best for:
- Comparing distributions between groups
- Identifying outliers
- Showing data spread
- Comparing multiple datasets side by side
Components of a Box Plot
- The Box
- Bottom of Box: First quartile (Q1)
- 25% of data falls below this point
- Line in Middle: Median
- 50% of data falls below this point
- Top of Box: Third quartile (Q3)
- 75% of data falls below this point
- Box Height: Interquartile range (IQR)
- Shows where middle 50% of data lies
- The Whiskers
- Bottom Whisker: Extends to minimum value
- Top Whisker: Extends to maximum value
- Whisker Length: Shows spread of non-outlier data
- Outliers
- Points beyond whiskers
- Individual dots above or below whiskers
- Unusually high or low values
Example: Test Scores
Let’s explore test scores from 0 to 100:
- Minimum: 45 (bottom whisker)
- Q1: 65 (bottom of box)
- Median: 75 (middle line)
- Q3: 85 (top of box)
- Maximum: 95 (top whisker)
What This Tells Us:
- Middle 50% of scores fall between 65 and 85
- Typical score is 75 (median)
- Spread is fairly symmetric (similar distance above/below median)
- No extreme outliers in this case
Common Patterns
- Symmetric Distribution:
- Equal whisker lengths
- Median in middle of box
- Skewed Distribution:
- Unequal whisker lengths
- Median closer to one end
- Outliers Present:
- Points beyond whiskers
- Individual markers
Reflection and Exploration
Think about numerical data you encounter:
- Monthly expenses
- Daily temperatures
- Exercise duration
Questions to consider:
- What’s the typical value?
- How spread out are the values?
- Are there any unusual values?