Cloud Hadoop: Scaling Apache Spark

Go to class
Write Review

Free Online Course: Cloud Hadoop: Scaling Apache Spark provided by LinkedIn Learning is a comprehensive online course, which lasts for 3-4 hours worth of material. The course is taught in English and is free of charge. Upon completion of the course, you can receive an e-certificate from LinkedIn Learning. Cloud Hadoop: Scaling Apache Spark is taught by Lynn Langit.

Overview
  • Generate genuine business insights from big data. Learn to implement Apache Hadoop and Spark workflows on AWS.

Syllabus
  • Introduction

    • Scaling Apache Hadoop and Spark
    • What you should know
    • Using cloud services
    1. Hadoop and Spark Fundamentals
    • Modern Hadoop and Spark
    • File systems used with Hadoop and Spark
    • Apache or commercial Hadoop distros
    • Hadoop and Spark libraries
    • Hadoop on Google Cloud Platform
    • Spark Job on Google Cloud Platform
    2. AWS Cloud Spark Environments
    • Sign up for Databricks Community Edition
    • Add Hadoop libraries
    • Databricks AWS Community Edition
    • Load data into tables
    • Hadoop and Spark cluster on AWS EMR
    • Run Spark job on AWS EMR
    • Review batch architecture for ETL on AWS
    3. Spark Basics
    • Apache Spark libraries
    • Spark data interfaces
    • Select your programming language
    • Spark session objects
    • Spark shell
    4. Using Spark
    • Tour the Databricks Environment
    • Tour the notebook
    • Import and export notebooks
    • Calculate Pi on Spark
    • Run WordCount of Spark with Scala
    • Import data
    • Transformations and actions
    • Caching and the DAG
    • Architecture: Streaming for prediction
    5. Spark Libraries
    • Spark SQL
    • SparkR
    • Spark ML: Preparing data
    • Spark ML: Building the model
    • Spark ML: Evaluating the model
    • Advanced machine learning on Spark
    • MXNet
    • Spark with ADAM for genomics
    • Spark architecture for genomics
    6. Spark Streaming
    • Reexamine streaming pipelines
    • Spark Streaming
    • Streaming ingest services
    • Advanced Spark Streaming with MLeap
    7. Scaling Spark on AWS and GCP
    • Scale Spark on the cloud by example
    • Build a quick start with Databricks AWS
    • Scale Spark cloud compute with VMs
    • Optimize cloud Spark virtual machines
    • Use AWS EKS containers and data lake
    • Optimize Spark cloud data tiers on Kubernetes
    • Build reproducible cloud infrastructure
    • Scale on GCP Dataproc or on Terra.bio
    Conclusion
    • Continue learning for scaling