What is Tidy Data?

In this module, we’ll explore a key idea in data science called “tidy data.” Understanding tidy data will make your future work with data much easier and less confusing.

Why Does Data Need to Be Tidy?

Imagine you’re working with a messy desk—papers are everywhere, and it’s hard to find what you need. Messy data is just like that: it’s hard to work with, and mistakes are easy to make. Tidy data is like having everything organized in neat folders, so you can quickly find and use the information you need.

What is Tidy Data?

Tidy data follows three simple rules:

Each variable forms a column. A variable is something you measure or record, like “age” or “score.”
Each observation forms a row. An observation is one item or event, like one person’s test results.
Each type of observational unit forms a table. If you have different things, like students and their test scores, you keep each type in its own table.

Example: Messy vs. Tidy Data

Let’s look at student test score data in both messy and tidy formats.

Messy Data Example

Row	Information
1	Alice, Math: 90, Reading: 85
2	Ben got an 80 in Math and Reading score: 88
3	Carla - Math 95, Read. 92

This data is messy because:

Each row has a different format
Variables aren’t in consistent columns
Information is mixed together and inconsistently labeled

Tidy Data Example

Name	Subject	Score
Alice	Math	90
Alice	Reading	85
Ben	Math	80
Ben	Reading	88
Carla	Math	95
Carla	Reading	92

This data is tidy because:

Each variable (Name, Subject, Score) has its own column
Each row is one observation (one student’s score in one subject)
The data is consistent and clearly organized

Understanding Tidy Data with Excel

Tidy data (sometimes called “panel data”) is best understood by thinking about how data is organized in a program like Excel:

A table is the entire grid of data. In Excel, this is your spreadsheet.
A row runs horizontally from left to right, labeled with numbers (1, 2, 3…).
A column runs vertically from top to bottom, labeled with letters (A, B, C…).
A cell is where a row and column intersect (like cell B3).
A column header is typically the first row of your table, containing labels that describe what information is in each column (like “Name”, “Age”, “Score”).
A row header is sometimes the first column, used to identify each record (like an ID number or name).

Tidy Data in Excel Terms:

Each variable forms a column - In Excel, each letter-labeled column would contain just one type of information, with a clear column header describing that variable.
Each observation forms a row - In Excel, each numbered row would represent one complete record, sometimes with a row header identifying that specific observation.
Each type of observational unit forms a table - Different types of related data would be in separate Excel worksheets.

Why This Organization Matters:

Easier sorting (click column headers in Excel)
Simpler filtering (use Excel’s filter buttons)
Consistent structure for analysis tools
Clearer pattern recognition
Foundation for more advanced data operations

Next Steps

In the next lesson, we’ll learn about different data types—like numbers, text, and dates—and why they matter when working with tidy data.

Proceed to next module on data types.