Building Open Source Software (OSS) Analytics Solutions with Azure HDInsight

Go to class
Write Review

Free Online Course: Building Open Source Software (OSS) Analytics Solutions with Azure HDInsight provided by Microsoft Learn is a comprehensive online course, which lasts for 5-6 hours worth of material. The course is taught in English and is free of charge.

Overview
    • Module 1: Introduction to the Open source Analytics Offering
    • At the end of this module, you will understand:

      • What HDInsight is
      • How HDInsight works
      • When to use HDInsight
    • Module 2: Choose the correct HDInsight Configuration to build open source analytics solutions
    • At the end of this module, you will understand:

      • The correct HDInsight configuration options
      • Decision criteria for selecting the correct HDInsight configuration option
      • Analyze a scenario and map it to an HDInsight configuration option
      • Cost Optimization strategies for HDInsight clusters
    • Module 3: Creating and configuring a HDInsight cluster
    • In this module you will:

      • Create an HDInsight Spark Cluster
      • Execute queries on an HDInsight Spark Cluster
      • Monitor an HDInsight Spark Cluster
      • Learn how to fix common provisioning issues
    • Module 4: Run Petabyte level OSS NoSQL databases with HDInsight HBase
      • Introduction
      • Use HDInsight HBase clusters
      • Describe HBase Architecture Patterns
      • Exercise - Provisioning a HDInsight HBase cluster
      • Exercise – Run benchmarks in HBase
      • Understand HBase Best Practices
      • Summary
      • Knowledge Check
    • Module 5: Perform advanced streaming data transformations with Apache Spark and Kafka in Azure HDInsight
    • At the end of this module you will understand:

      • When to use Apache Spark and Kafka with HDInsight
      • How Spark Structured Streaming works
      • The architecture of a Kafka and Spark solution
      • How to provision HDInsight, create a Kafka producer, and stream Kafka data to a Jupyter notebook
      • How to replicate data to a secondary cluster
    • Module 6: Perform Zero ETL analytics with HDInsight Interactive Query
    • In this module you will learn the following:

      • Appropriate scenarios to deploy HDInsight Interactive Query clusters
      • Learn about architectural patterns
      • Deploy a cluster for your real-estate app and query the data
      • Learn how to integrate Apache Spark and Hive LLAP queries using the Hive Warehouse Connector
      • Create a large-scale interactive query dashboard to evaluate real estate values and locations
    • Module 7: Manage enterprise security in HDInsight
    1. Introduction
    2. Describe HDInsight security areas
    3. Implement Network Security
    4. Understand Operating system security
    5. Manage Application/ Middleware security
    6. Implement Data Access security
    7. Knowledge Check
    8. Summary

Syllabus
    • Module 1: Introduction to the Open source Analytics Offering
      • Introduction
      • What is HDInsight?
      • How does HDInsight work
      • When to use HDInsight
      • Knowledge check
      • Summary
    • Module 2: Choose the correct HDInsight Configuration to build open source analytics solutions
      • Introduction
      • HDInsight configuration options
      • Decision criteria for selecting the correct HDInsight configuration option
      • Analyze a scenario and map it to a HDInsight configuration option
      • Cost optimization strategies for HDinsight clusters
      • Knowledge check
      • Summary
    • Module 3: Creating and configuring a HDInsight cluster
      • Introduction
      • Creating an HDInsight cluster
      • Exercise - Create an HDInsight cluster via the Azure portal
      • Opening a Jupyter Notebook on HDInsight Spark cluster
      • Exercise - Execute queries on HDInsight Spark cluster
      • Enable monitoring of HDInsight jobs
      • Common provisioning Issues
      • Exercise - Monitor an HDInsight cluster
      • Summary
      • Knowledge check
    • Module 4: Run Petabyte level OSS NoSQL databases with HDInsight HBase
      • Introduction
      • Describe Apache HBase
      • Explain HDInsight HBase clusters architecture and application patterns
      • Improve the write and read performance of HBase clusters
      • Determine migration and high availability strategies in HDInsight HBase
      • Use Apache Phoenix on HDInsight HBase
      • Determine HDInsight HBase cluster performance
      • Perform benchmarking in HBase
      • Knowledge check
      • Summary
    • Module 5: Perform advanced streaming data transformations with Apache Spark and Kafka in Azure HDInsight
      • Introduction
      • Use HDInsight Spark and Kafka
      • Stream data with Apache Kafka
      • Describe Spark structured streaming
      • Create a Kafka and Spark architecture
      • Exercise - Provision HDInsight to perform advanced streaming data transformations
      • Exercise - Create the Kafka producer
      • Exercise - Stream Kafka data to a Jupyter notebook and window the data
      • Replicate data to a secondary cluster
      • Knowledge check
      • Summary
    • Module 6: Perform Zero ETL analytics with HDInsight Interactive Query
      • Introduction
      • When should you use HDInsight Interactive Query
      • HDInsight interactive queries
      • Exercise - Provision HDInsight to perform adhoc analytics
      • Exercise - Upload and query data in HDInsight
      • Integrate Apache Spark and Hive LLAP queries
      • Create a large scale interactive query dashboard for Evaluating Real Estate Trends
      • Summary
      • Knowledge check
    • Module 7: Manage enterprise security in HDInsight
      • Introduction
      • Describe HDInsight security areas
      • Implement Network security
      • Understand operating system security
      • Manage application/ middleware security
      • Implement data access security
      • Knowledge check
      • Summary