Data Science Foundations: Data Assessment for Predictive Modeling

Go to class
Write Review

Free Online Course: Data Science Foundations: Data Assessment for Predictive Modeling provided by LinkedIn Learning is a comprehensive online course, which lasts for 4-5 hours worth of material. The course is taught in English and is free of charge. Upon completion of the course, you can receive an e-certificate from LinkedIn Learning. Data Science Foundations: Data Assessment for Predictive Modeling is taught by Keith McCormick.

Overview
  • Explore the data understanding phase of the CRISP-DM methodology for predictive modeling. Find out how to collect, describe, explore, and verify data.

Syllabus
  • Introduction

    • Why data assessment is critical
    • A note about the exercise files
    1. What Is Data Assessment?
    • Clarifying how data understanding differs from data visualization
    • Introducing the critical data understanding phase of CRISP-DM
    • Data assessment in CRISP-DM alternatives: The IBM ASUM-DM and Microsoft TDSP
    • Navigating the transition from business understanding to data understanding
    • How to organize your work with the four data understanding tasks
    2. Collect Initial Data
    • Considerations in gathering the relevant data
    • A strategy for processing data sources
    • Getting creative about data sources
    • How to envision a proper flat file
    • Anticipating data integration
    3. First Look at the Data
    • Reviewing basic concepts in the level of measurement
    • What is dummy coding?
    • Expanding our definition of level of measurement
    • Taking an initial look at possible key variables
    • Dealing with duplicate IDs and transactional data
    • How many potential variables (columns) will I have?
    • How to deal with high-order multiple nominals
    • Challenge: Identifying the level of measurement
    • Solution: Identifying the level of measurement
    4. Data Loading and Unit of Analysis
    • Introducing the KNIME Analytics Platform
    • Tips and tricks to consider during data loading
    • Unit analysis decisions
    • Challenge: What should the row be?
    • Solution: What should the row be?
    5. Describe Data
    • How to uncover the gross properties of the data
    • Researching the dataset
    • Tips and tricks using simple aggregation commands
    • A simple strategy for organizing your work
    6. Data Description Case Studies
    • Describe data demo using the UCI heart dataset
    • Challenge: Practice describe data with the UCI heart dataset
    • Solution: Practice describe data with the UCI heart dataset
    7. Explore Data Basics
    • The explore data task
    • How to be effective doing univariate analysis and data visualization
    • Anscombe's quartet
    • The Data Explorer node feature in KNIME
    • How to navigate borderline cases of variable type
    • How to be effective in doing bivariate data visualization
    • Challenge: Producing bivariate visualizations for case study 1
    • Solution: Producing bivariate visualizations for case study 1
    8. Explore Data Tips and Tricks
    • How to utilize an SME's time effectively
    • Techniques for working with the top predictors
    • Advice for weak predictors
    • Tips and tricks when searching for quirks in your data
    • Learning when to discard rows
    • Introducing ggplot2
    • Orientating to R's ggplot2 for powerful multivariate data visualizations
    • Challenge: Producing multivariate visualizations for case study 1
    • Solution: Producing multivariate visualizations for case study 1
    9. Verify Data Quality
    • Exploring your missing data options
    • Why you lose rows to listwise deletion
    • Investigating the provenance of the missing data
    10. Missing Data Case Study
    • Introducing the KDD Cup 1998 data
    • What is the pattern of missing data in your data?
    • Is the missing data worth saving?
    • Assessing imputation as a potential solution
    11. Explore and Verify Case Studies
    • Exploring and verifying data quality with the UCI heart dataset
    • Challenge: Quantifying missing data with the UCI heart dataset
    • Solution: Quantifying missing data with the UCI heart dataset
    12. Making the Transition to Data Preparation
    • Why formal reports are important
    • Creating a data prep to-do list
    • How to prepare for eventual deployment
    Conclusion
    • Next steps