-
This course introduces the key steps involved in the data mining pipeline, including data understanding, data preprocessing, data warehousing, data modeling, interpretation and evaluation, and real-world applications.
Data Mining Pipeline can be taken for academic credit as part of CU Boulder’s Master of Science in Data Science (MS-DS) degree offered on the Coursera platform. The MS-DS is an interdisciplinary degree that brings together faculty from CU Boulder’s departments of Applied Mathematics, Computer Science, Information Science, and others. With performance-based admissions and no application process, the MS-DS is ideal for individuals with a broad range of undergraduate education and/or professional experience in computer science, information science, mathematics, and statistics. Learn more about the MS-DS program at https://www.coursera.org/degrees/master-of-science-data-science-boulder.
Course logo image courtesy of Francesco Ungaro, available here on Unsplash: https://unsplash.com/photos/C89G61oKDDA
Overview
Syllabus
-
- Data Mining Pipeline
- This module provides an introduction to data mining and data mining pipeline, including the four views of data mining and the key components in the data mining pipeline.
- Data Understanding
- This module covers data understanding by identifying key data properties and applying techniques to characterize different datasets.
- Data Preprocessing
- This module explains why data preprocessing is needed and what techniques can be used to preprocess data.
- Data Warehousing
- This module covers the key characteristics of data warehousing and the techniques to support data warehousing.