Cleaning and Preparing Data Course

Go to class
Write Review

Free Online Course: Cleaning and Preparing Data Course provided by Treehouse is a comprehensive online course, which lasts for 2-3 hours worth of material. The course is taught in English and is free of charge. Cleaning and Preparing Data Course is taught by Alyssa Batula.

Overview
  • We rely on data to answer important questions, whether we are trying to make the best business decisions or determine the effectiveness of a new medical treatment. But our analyses are only as accurate as the data we are using, and incorrect or “dirty” data can lead to incorrect conclusions and assumptions. Data preparation, also called “cleaning” or “scrubbing”, is an important part of ensuring our analyses are accurate and useful.

    What you'll learn

    • Cleaning and scrubbing data
    • Potential problems within datasets
    • Understanding your dataset
    • Handling bad data

Syllabus
  • “Clean” and “Dirty” Data

    Welcome! In this stage, you will learn about why having a properly cleaned dataset is important and some of the problems you may encounter when cleaning a dataset. we will also take our first look at the data we will be using throughout this course.

    Chevron 6 steps
    • What is Data Cleaning?

      3:51

    • Types of Bad Data

      5:18

    • Data Preparation Basics

      7 questions

    • Understanding Your Dataset

      2:15

    • Exploring Your Dataset

      7:15

    • Understanding Your Dataset

      7 questions

    Handling Bad Data

    Now that we know a little bit about our dataset and the data cleaning process, we will take a closer look at some common issues using our example dataset. Sometimes these issues can be fixed, while other times it’s best to remove the data from our analyses. We can even write programs to help us automate some of the data preparation process, saving time and effort.

    Chevron 10 steps
    • Simple Data Issues

      8:37

    • Sensible Column Names and Values

      6:12

    • Fixing or Excluding Data

      3:39

    • Simple Fixes and Exclusions Review

      11 questions

    • Missing Data

      12:31

    • Fixes and Exclusions for Complex Issues

      5 questions

    • Duplicated Data

      9:02

    • Infeasible and Extreme Data

      8:51

    • Automating Data Preparation

      8:08

    • Automating Data Preparation

      5 questions

    Selecting Relevant Data

    While it may seem like more data is always better, usually we only want to look at the information that’s relevant to the question we are trying to answer. In this stage, we will look at different ways of choosing the most applicable data.

    Chevron 6 steps
    • Making Your Dataset Smaller

      2:11

    • Choosing the Right Features

      8:45

    • Selecting the Right Data

      6 questions

    • Automated Feature Selection

      5:55

    • Cleaning and Preparing Data

      1:31

    • Automating Feature Selection

      5 questions