Complete PySpark Developer Course (Spark with Python)

Go to class
Write Review

Complete PySpark Developer Course (Spark with Python) provided by Udemy is a comprehensive online course, which lasts for 30 hours worth of material. Complete PySpark Developer Course (Spark with Python) is taught by Learn-Spark.info (Spark University). Upon completion of the course, you can receive an e-certificate from Udemy. The course is taught in Englishand is Paid Course. Visit the course page at Udemy for detailed price information.

Overview
  • Learn PySpark in depth with hundreds of Practical examples. Be a complete PySpark Developer. Set up a Hadoop Cluster.

    What you'll learn:

    • Complete Curriculum for a successful PySpark Developer
    • Hadoop Single Node Cluster Set up and Integrate with Spark 2.x and Spark 3.x
    • Complete Flow of Installation of PySpark (Windows and Unix)
    • Detailed HDFS Course
    • Python Crash Course
    • Introduction to Spark
    • Understand SparkSession
    • Spark RDD Fundamentals, Operations, Persistence. Practical Examples to solve problems.
    • Spark Cluster Architecture - Execution, YARN, JVM Processes, DAG Scheduler, Task Scheduler
    • Spark Shared Variables
    • Spark SQL Architecture, Catalyst Optimizer, Volcano Iterator Model, Tungsten Execution Engine
    • DataFrame Fundamentals
    • DataFrame Rows, Columns and DataTypes. Practical examples.
    • ETL Using DataFrame (Extraction APIs, Transformation APIs, and Loading APIs). Practical Examples.
    • Optimization and Management - Join Strategies, Driver Conf, Executor Conf etc

    This is a complete PySpark Developer course for Data Engineers and Data Scientists and others who wants to process Big Data in an effective manner. We will cover below topics and more:

    • Complete Curriculum for a successful PySpark Developer

    • Set up Hadoop Single Node Cluster and Integrate it with Spark 2.x and Spark 3.x

    • Complete Flow of Installation of Standalone PySpark (Unix and Windows Operating System)

    • Detailed HDFS Commands and Architecture.

    • Python Crash Course

    • Introduction to Spark (Why Spark was Developed, Spark Features, Spark Components)

    • Understand SparkSession

    • Spark RDD Fundamentals

    • How to Create RDDs

    • RDD Operations (Transformations & Actions)

    • Spark Cluster Architecture - Execution, YARN, JVM Processes, DAG Scheduler, Task Scheduler

    • RDD Persistence

    • Spark Shared Variables - Broadcast

    • Spark Shared Variables - Accumulators)

    • Spark SQL Architecture, Catalyst Optimizer, Volcano Iterator Model, Tungsten Execution Engine, Different Benchmarks

    • Difference between Catalyst Optimizer and Volcano Iterator Model

    • Spark Commonly Used Functions - Version, range, createDataFrame, sql, table, SparkContext, conf, read, udf, newSession, stop, catalog etc

    • DataFrame Built-in functions - new column functions, encryption functions, string functions, regexp functions, date functions, null functions, collection functions, na functions, math and statistics functions, explode functions, flatten functions, formatting and json functions

    • What is Partition,

    • What is Repartition

    • What is Coalesce

    • Repartition Vs Coalesce

    • Extraction - csv file, text file, Parquet File, orc file, json file, avro file, hive, jdbc

    • DataFrame Fundamentals

    • What is a DataFrame

    • DataFrame Sources

    • DataFrame Features

    • DataFrame Organization

    • DataFrame Rows,

    • DataFrame Columns

    • DataTypes. Practical examples.

    • Perform ETL Using DataFrame

      -- Extraction APIs

      -- Transformation APIs

      -- Loading APIs

      -- Practical Examples.

    • Optimization and Management - Join Strategies, Driver Conf, Parallelism Configurations, Executor Conf etc