Understanding Histograms

In this module, we’ll explore histograms - a powerful visualization tool for understanding how numerical data is distributed. While similar to bar plots, histograms serve a different purpose that we’ll discover together.

A histogram shows the distribution of continuous numerical data where:

  • Data is grouped into ranges called “bins”
  • Each bin shows how many values fall within that range
  • Bins are adjacent (touching) because the data is continuous
  • The height represents frequency (count) or density

Think of it like sorting items into adjacent boxes based on their measurements, then stacking blocks to show how full each box is.

Histogram vs. Bar Plot

Key differences:

  1. Data Type:
    • Histograms: Continuous numerical data (like heights, weights, times)
    • Bar plots: Categorical data or discrete counts
  2. Spacing:
    • Histograms: Bars touch (no gaps)
    • Bar plots: Bars separated by spaces
  3. Meaning:
    • Histograms: Show distribution shape
    • Bar plots: Compare categories

Understanding Bins

Bins are the foundation of histograms:

  1. What are Bins?
    • Continuous ranges of values
    • Example: 0-10, 10-20, 20-30, etc.
  2. Bin Width
    • Affects the story your data tells
    • Too wide: Hides important patterns
    • Too narrow: Shows too much noise

Example: Height Distribution

Let’s explore student heights in centimeters:

  • Data range: 150cm to 190cm
  • Bin width: 10cm

Bins and Counts:

  • First bin, 150-160cm: 5 students (shortest bin given number of students)
  • Second bin, 160-170cm: 15 students
  • Third bin, 170-180cm: 20 students (tallest bin given number of students)
  • Fourth bin, 180-190cm: 10 students

What This Tells Us:

  1. Most students are 170-180cm tall (peak)
  2. Heights are roughly symmetric around the middle
  3. Fewer students at extreme heights (tails)

Common Shapes of Distributions

  1. Bell Shape (Normal)
    • Peaks in middle
    • Symmetric
    • Example: Adult heights
  2. Skewed Right
    • Longer tail on right
    • Example: Salaries
  3. Skewed Left
    • Longer tail on left
    • Example: Age at retirement
  4. Uniform
    • All bins roughly equal
    • Example: Random numbers

When to Use Histograms

Best for:

  1. Understanding data distribution
  2. Finding patterns in continuous data
  3. Identifying outliers
  4. Checking for symmetry or skewness

Reflection and Exploration

Think about continuous data in your life:

  • Daily temperature readings
  • Time spent on activities
  • Distance walked each day

Try describing the pattern you might expect:

  • “Most days I walk 2-3 kilometers”
  • “Fewer days with very short or very long walks”
  • “The distribution would probably peak around 2.5 kilometers”