Perform data science with Azure Databricks

Go to class
Write Review

Free Online Course: Perform data science with Azure Databricks provided by Microsoft Learn is a comprehensive online course, which lasts for 8-9 hours worth of material. The course is taught in English and is free of charge.

Overview
    • Module 1: Describe Azure Databricks
    • In this module, you will:

      • Understand the Azure Databricks platform
      • Create your own Azure Databricks workspace
      • Create a notebook inside your home folder in Databricks
      • Understand the fundamentals of Apache Spark notebook
      • Create, or attach to, a Spark cluster
      • Identify the types of tasks well-suited to the unified analytics engine Apache Spark
    • Module 2: Spark architecture fundamentals
    • In this module, you will:

      • Understand the architecture of an Azure Databricks Spark Cluster
      • Understand the architecture of a Spark Job
    • Module 3: Read and write data in Azure Databricks
    • In this module, you will:

      • Use Azure Databricks to read multiple file types, both with and without a Schema.
      • Combine inputs from files and data stores, such as Azure SQL Database.
      • Transform and store that data for advanced analytics.
    • Module 4: Work with DataFrames in Azure Databricks
    • In this module, you will:

      • Use the count() method to count rows in a DataFrame
      • Use the display() function to display a DataFrame in the Notebook
      • Cache a DataFrame for quicker operations if the data is needed a second time
      • Use the limit function to display a small set of rows from a larger DataFrame
      • Use select() to select a subset of columns from a DataFrame
      • Use distinct() and dropDuplicates to remove duplicate data
      • Use drop() to remove columns from a DataFrame
    • Module 5: Work with user-defined functions
    • In this module, you will learn how to:

      • Write User-Defined Functions
      • Perform ETL operations using User-Defined Functions
    • Module 6: Build and query a Delta Lake
    • In this module, you will:

      • Learn about the key features and use cases of Delta Lake.
      • Use Delta Lake to create, append, and upsert tables.
      • Perform optimizations in Delta Lake.
      • Compare different versions of a Delta table using Time Machine.
    • Module 7: Perform machine learning with Azure Databricks
    • In this module, you will learn how to:

      • Perform Machine Learning
      • Train a model and create predictions
      • Perform exploratory data analysis
      • Describe machine learning workflows
      • Build and evaluate machine learning models
    • Module 8: Train a machine learning model
    • In this module, you will learn how to:

      • Perform featurization of the dataset
      • Finish featurization of the dataset
      • Understand Regression modeling
      • Build and interpret a regression model
    • Module 9: Work with MLflow in Azure Databricks
    • In this module, you will learn how to:

      • Use MLflow to track experiments, log metrics, and compare runs
      • Work with MLflow to track experiment metrics, parameters, artifacts and models.
    • Module 10: Perform model selection with hyperparameter tuning
    • In this module, you will learn how to:

      • Describe Model selection and Hyperparameter Tuning
      • Select the optimal model by tuning Hyperparameters
    • Module 11: Deep learning with Horovod for distributed training
    • In this module, you will learn how to:

      • Use Horovod to train a deep learning model
      • Use Petastorm to read datasets in Apache Parquet format with Horovod for distributed model training
      • Work with Horovod and Petastorm for training a deep learning model
    • Module 12: Work with Azure Machine Learning to deploy serving models
    • In this module, you will learn how to:

      • Use Azure Machine Learning to deploy Serving Models

Syllabus
    • Module 1: Describe Azure Databricks
      • Introduction
      • Explain Azure Databricks
      • Create an Azure Databricks workspace and cluster
      • Understand Azure Databricks Notebooks
      • Exercise: Work with Notebooks
      • Knowledge check
      • Summary
    • Module 2: Spark architecture fundamentals
      • Introduction
      • Understand the architecture of Azure Databricks spark cluster
      • Understand the architecture of spark job
      • Knowledge check
      • Summary
    • Module 3: Read and write data in Azure Databricks
      • Introduction
      • Read data in CSV format
      • Read data in JSON format
      • Read data in Parquet format
      • Read data stored in tables and views
      • Write data
      • Exercises: Read and write data
      • Knowledge check
      • Summary
    • Module 4: Work with DataFrames in Azure Databricks
      • Introduction
      • Describe a DataFrame
      • Use common DataFrame methods
      • Use the display function
      • Exercise: Distinct articles
      • Knowledge check
      • Summary
    • Module 5: Work with user-defined functions
      • Introduction
      • Write user defined functions
      • Exercise: Perform Extract, Transform, Load(ETL) operations using user-defined functions
      • Knowledge check
      • Summary
    • Module 6: Build and query a Delta Lake
      • Introduction
      • Describe the open source Delta Lake
      • Exercise: Work with basic Delta Lake functionality
      • Describe how Azure Databricks manages Delta Lake
      • Exercise: Use the Delta Lake Time Machine and perform optimization
      • Knowledge check
      • Summary
    • Module 7: Perform machine learning with Azure Databricks
      • Introduction
      • Understand machine learning
      • Exercise: Train a model and create predictions
      • Understand data using exploratory data analysis
      • Exercise: Perform exploratory data analysis
      • Describe machine learning workflows
      • Exercise: Build and evaluate a baseline machine learning model
      • Knowledge check
      • Summary
    • Module 8: Train a machine learning model
      • Introduction
      • Perform featurization of the dataset
      • Exercise: Finish featurization of the dataset
      • Understand regression modeling
      • Exercise: Build and interpret a regression model
      • Knowledge check
      • Summary
    • Module 9: Work with MLflow in Azure Databricks
      • Introduction
      • Use MLflow to track experiments, log metrics, and compare runs
      • Exercise: Work with MLflow to track experiment metrics, parameters, artifacts and models
      • Knowledge check
      • Summary
    • Module 10: Perform model selection with hyperparameter tuning
      • Introduction
      • Describe model selection and hyperparameter tuning
      • Exercise: Select optimal model by tuning hyperparameters
      • Knowledge check
      • Summary
    • Module 11: Deep learning with Horovod for distributed training
      • Introduction
      • Use Horovod to train a deep learning model
      • Use Petastorm to read datasets in Apache Parquet format with Horovod for distributed model training
      • Exercise: Work with Horovod and Petastorm for training a deep learning model
      • Knowledge check
      • Summary
    • Module 12: Work with Azure Machine Learning to deploy serving models
      • Introduction
      • Use Azure Machine Learning to deploy serving models
      • Knowledge check
      • Summary