Perform data engineering with Azure Synapse Apache Spark Pools

Go to class
Write Review

Free Online Course: Perform data engineering with Azure Synapse Apache Spark Pools provided by Microsoft Learn is a comprehensive online course, which lasts for 2-3 hours worth of material. The course is taught in English and is free of charge.

Overview
    • Module 1: Understand big data engineering with Apache Spark in Azure Synapse Analytics
    • After completing this module, you will be able to:

      • Differentiate between Apache Spark and Spark pools
      • Differentiate between Azure Databricks and Spark pools
      • Differentiate between HDInsight and Spark Pools
      • Differentiate between Spark Pools and SQL Pools
      • Understand the use-cases of data engineering with Apache Spark in Azure Synapse analytics
      • Create a Spark pool in Azure Synapse Analytics
    • Module 2: Ingest data with Apache Spark notebooks in Azure Synapse Analytics
    • After completing this module, you will be able to:

      • Understand the use-cases for Spark Notebooks
      • Create a Spark Notebook in Azure Synapse Analytics
      • Understand the supported languages in Spark Notebooks
      • Develop Spark Notebooks
      • Run Spark Notebooks
      • Load data in Spark Notebooks
      • Save Spark Notebooks
    • Module 3: Transform data with DataFrames in Apache Spark Pools in Azure Synapse Analytics
    • After completing this module, you will be able to:

      • Understand DataFrames in Spark Pools in Azure Synapse Analytics
      • Load data into a Spark DataFrame
      • Create a Spark table
      • Write Data to and from a storage account
      • Load a streaming DataFrame into Apache Spark
      • Flatten nested structures and explode arrays with Apache Spark
    • Module 4: Integrate SQL and Apache Spark pools in Azure Synapse Analytics
    • After completing this module, you will be able to:

      • Describe the integration methods between SQL and Spark Pools in Azure Synapse Analytics
      • Understand the use-cases for SQL and Spark Pools integration
      • Authenticate in Azure Synapse Analytics
      • Transfer data between SQL and Spark Pool in Azure Synapse Analytics
      • Authenticate between Spark and SQL Pool in Azure Synapse Analytics
      • Integrate SQL and Spark Pools in Azure Synapse Analytics
      • Externalize the use of Spark Pools within Azure Synapse workspace
      • Transfer data outside the Synapse workspace using SQL Authentication
      • Transfer data outside the Synapse workspace using the PySpark Connector
      • Transform data in Apache Spark and write back to SQL Pool in Azure Synapse Analytics
    • Module 5: Monitor and manage data engineering workloads with Apache Spark in Azure Synapse Analytics
    • After completing this module, you will be able to:

      • Monitor Spark Pools in Azure Synapse Analytics
      • Understand Resource Utilization of Spark Pools in Azure Synapse Analytics
      • Monitor Query activity of Spark Pools in Azure Synapse Analytics
      • Base-line Apache Spark performance with Apache Spark History Server in Azure Synapse Analytics
      • Optimize Apache Spark jobs in Azure Synapse Analytics
      • Automate scaling of Apache Spark pools in Azure Synapse Analytics

Syllabus
    • Module 1: Understand big data engineering with Apache Spark in Azure Synapse Analytics
      • Introduction
      • What is an Apache Spark pool in Azure Synapse Analytics
      • How do Apache Spark pools work in Azure Synapse Analytics
      • When do you use Apache Spark pools in Azure Synapse Analytics
      • Knowledge check
      • Summary
    • Module 2: Ingest data with Apache Spark notebooks in Azure Synapse Analytics
      • Introduction
      • Introduction to spark notebooks
      • Understand the use-cases for spark notebooks
      • Exercise: Create a spark notebook in Azure Synapse Analytics
      • Discover supported languages in spark notebooks
      • Develop spark notebooks
      • Exercise: Develop spark notebooks
      • Run spark notebooks
      • Exercise: Run spark notebooks
      • Load data in spark notebooks
      • Exercise: Load data in spark notebooks
      • Save spark notebooks
      • Knowledge check
      • Summary
    • Module 3: Transform data with DataFrames in Apache Spark Pools in Azure Synapse Analytics
      • Introduction
      • Introduction to dataframes in spark pools in Azure Synapse Analytics
      • Load data into a spark dataframe
      • Exercise: Load data into a spark dataframe
      • Exercise: Create a spark table
      • Flatten nested structures and explode arrays with Apache Spark
      • Exercise: Flatten nested structures and explode arrays with Apache Spark in synapse
      • Knowledge check
      • Summary
    • Module 4: Integrate SQL and Apache Spark pools in Azure Synapse Analytics
      • Introduction
      • Describe the integration methods between SQL and spark pools in Azure Synapse Analytics
      • Understand the use-cases for SQL and spark pools integration
      • Authenticate in Azure Synapse Analytics
      • Transfer data between SQL and spark pool in Azure Synapse Analytics
      • Authenticate between spark and SQL pool in Azure Synapse Analytics
      • Exercise: Integrate SQL and spark pools in Azure Synapse Analytics
      • Externalize the use of spark pools within Azure Synapse Workspace
      • Transfer data outside the synapse workspace using the PySpark connector
      • Knowledge check
      • Summary
    • Module 5: Monitor and manage data engineering workloads with Apache Spark in Azure Synapse Analytics
      • Introduction
      • Monitor spark pools in Azure Synapse Analytics
      • Base-line Apache Spark performance with Apache Spark history server in Azure Synapse Analytics
      • Optimize Apache Spark jobs in Azure Synapse Analytics
      • Automate scaling of Apache Spark pools in Azure Synapse Analytics
      • Knowledge check
      • Summary