Understanding Histograms
In this module, we’ll explore histograms - a powerful visualization tool for understanding how numerical data is distributed. While similar to bar plots, histograms serve a different purpose that we’ll discover together.
A histogram shows the distribution of continuous numerical data where:
- Data is grouped into ranges called “bins”
- Each bin shows how many values fall within that range
- Bins are adjacent (touching) because the data is continuous
- The height represents frequency (count) or density
Think of it like sorting items into adjacent boxes based on their measurements, then stacking blocks to show how full each box is.
Histogram vs. Bar Plot
Key differences:
- Data Type:
- Histograms: Continuous numerical data (like heights, weights, times)
- Bar plots: Categorical data or discrete counts
- Spacing:
- Histograms: Bars touch (no gaps)
- Bar plots: Bars separated by spaces
- Meaning:
- Histograms: Show distribution shape
- Bar plots: Compare categories
Understanding Bins
Bins are the foundation of histograms:
- What are Bins?
- Continuous ranges of values
- Example: 0-10, 10-20, 20-30, etc.
- Bin Width
- Affects the story your data tells
- Too wide: Hides important patterns
- Too narrow: Shows too much noise
Example: Height Distribution
Let’s explore student heights in centimeters:
- Data range: 150cm to 190cm
- Bin width: 10cm
Bins and Counts:
- First bin, 150-160cm: 5 students (shortest bin given number of students)
- Second bin, 160-170cm: 15 students
- Third bin, 170-180cm: 20 students (tallest bin given number of students)
- Fourth bin, 180-190cm: 10 students
What This Tells Us:
- Most students are 170-180cm tall (peak)
- Heights are roughly symmetric around the middle
- Fewer students at extreme heights (tails)
Common Shapes of Distributions
- Bell Shape (Normal)
- Peaks in middle
- Symmetric
- Example: Adult heights
- Skewed Right
- Longer tail on right
- Example: Salaries
- Skewed Left
- Longer tail on left
- Example: Age at retirement
- Uniform
- All bins roughly equal
- Example: Random numbers
When to Use Histograms
Best for:
- Understanding data distribution
- Finding patterns in continuous data
- Identifying outliers
- Checking for symmetry or skewness
Reflection and Exploration
Think about continuous data in your life:
- Daily temperature readings
- Time spent on activities
- Distance walked each day
Try describing the pattern you might expect:
- “Most days I walk 2-3 kilometers”
- “Fewer days with very short or very long walks”
- “The distribution would probably peak around 2.5 kilometers”