What is Data Science?
In this module, we’re starting our journey into the world of data together. Data science is like having a special toolkit that helps us turn information such as numbers, words, or lists into clear answers to questions about the world around us. You don’t need to know any programming yet; we’ll focus on understanding the main ideas and how data science can help us solve real problems.
What Is Data Science?
At its core, data science brings together three things: 1. Data: any collection of facts, such as customer records, sensor readings, or survey responses. 2. Science: a systematic, repeatable approach—much like following a recipe in a kitchen. 3. Insights: the meaningful answers or patterns we uncover.
For example, imagine you ask, “Which product sold most last month?” The data might be a simple table of sales transactions. Following scientific steps, you clean and analyze that table, and your insight could be “red winter jacket was our bestseller.” That insight helps guide business decisions.
Data Science Workflow
Data science projects often follow a clear, systematic flow:
Step 1: Import Data
Bring raw data into your workspace from files, databases, or APIs (Application Programming Interfaces, which are ways to request data from online services).
Step 2: Tidy Data
Structure your data so each variable is a column, each observation is a row, and each type of unit is in its own table.
Step 3: Transform & Explore
Filter, summarize, and visualize the data to detect patterns and check assumptions.
Step 4: Model Data
Apply statistical methods or predictive techniques to answer your question.
Step 5: Communicate Results
Share your insights with clear explanations, tables, and visual descriptions.
We believe the fifth step (communicating results) is so important that the final project for this course is built around a presentation, which you will record and submit with your other deliverables. More details about the final project will be shared soon!
Concrete Example
Let’s use our workflow to answer a simple question: “How long do people spend on our website?”
Imagine we have a list of four people who visited our website yesterday. For each person, we know how many seconds they spent on the site. Here are the times:
- The first person spent 2 minutes (which is 120 seconds).
- The second person spent 5 minutes (which is 300 seconds).
- The third person spent 3 minutes (which is 180 seconds).
- The fourth person spent 4 minutes (which is 240 seconds).
Let’s walk through the steps:
Step 1: Import Data
Our data is the four session times: 120, 300, 180, and 240 seconds.
Step 2: Tidy Data
Each session time is listed separately, with no missing or extra information. The data is tidy.
Step 3: Transform & Explore
We check that all times are positive numbers and there are no mistakes. Next, we add up all the times: 120 plus 300 plus 180 plus 240 equals 840 seconds in total.
Step 4: Model Data
To find the average, we divide the total time by the number of people: 840 divided by 4 equals 210 seconds.
Step 5: Communicate Results
We can say: “On average, each person spent 210 seconds on the website.” To make it even clearer, 210 seconds is the same as 3 minutes and 30 seconds.
This example shows how we start with a simple question and use data, step by step, to find a clear answer.
Next Steps
In our next module, we’ll learn conceptually what “tidy data” is, why it matters, and how it sets the stage for smooth analysis—no code required yet.