Introduction to Spark SQL and DataFrames

Go to class
Write Review

Free Online Course: Introduction to Spark SQL and DataFrames provided by LinkedIn Learning is a comprehensive online course, which lasts for 1-2 hours worth of material. The course is taught in English and is free of charge. Upon completion of the course, you can receive an e-certificate from LinkedIn Learning. Introduction to Spark SQL and DataFrames is taught by Dan Sullivan.

Overview
  • Learn about DataFrames, a widely used data structure in Apache Spark. Discover how to manipulate and analyze distributed data with the DataFrames API and SQL.

Syllabus
  • Introduction

    • Apache Spark SQL and data analysis
    • What you should know
    1. Introduction to Spark DataFrames
    • Introduction to DataFrames
    • SQL for DataFrames
    2. Installing Spark
    • Install Spark
    • Install PySpark
    • Using Jupyter notebooks with PySpark
    3. Getting Started with Spark DataFrames
    • Set up a Jupyter notebook
    • Load data into DataFrames: CSV Files
    • Load data into DataFrames: JSON Files
    • Basic DataFrame operations
    • Filter data with DataFrame API
    • Aggregate data with DataFrame API
    • Sample data from DataFrames
    • Save data from DataFrames
    4. SQL for DataFrames
    • Querying DataFrames with SQL
    • Filtering DataFrames with SQL
    • Aggregating Data with SQL
    • Joining DataFrames with SQL
    • Eliminating duplicates in DataFrames
    • Working with NA values in DataFrames
    5. Data Analysis with Spark
    • Exploratory data analysis with DataFrames
    • Exploratory data analysis with Spark SQL
    • Timeseries analysis with DataFrames
    • Basic machine learning with DataFrames, part 1
    • Basic machine learning with DataFrames, part 2
    Conclusion
    • Next steps