Apache PySpark by Example

Go to class
Write Review

Free Online Course: Apache PySpark by Example provided by LinkedIn Learning is a comprehensive online course, which lasts for 1-2 hours worth of material. The course is taught in English and is free of charge. Upon completion of the course, you can receive an e-certificate from LinkedIn Learning. Apache PySpark by Example is taught by Jonathan Fernandes.

Overview
  • Get up and running with Apache Spark quickly. This practical hands-on course shows Python users how to work with Apache PySpark to leverage the power of Spark for data science.

Syllabus
  • Introduction

    • Apache PySpark
    • What you should know
    1. Introduction to Apache Spark
    • The Apache Spark ecosystem
    • Why Spark?
    • Spark origins and Databricks
    • Spark components
    • Partitions, transformations, lazy evaluations, and actions
    2. Technical Setup
    • Set up the lab environment
    • Download a dataset
    • Importing
    3. Working with the DataFrame API
    • The DataFrame API
    • Working with DataFrames
    • Schemas
    • Working with columns
    • Working with rows
    • Challenge
    • Solution
    4. Functions
    • Built-in functions
    • Working with dates
    • User-defined functions
    • Working with joins
    • Challenge
    • Solution
    5. Resilient Distributed Datasets (RDDs)
    • RDDs
    • Working with RDDs
    Conclusion
    • Next steps