Apache Spark Essential Training: Big Data Engineering

Go to class
Write Review

Free Online Course: Apache Spark Essential Training: Big Data Engineering provided by LinkedIn Learning is a comprehensive online course, which lasts for 1-2 hours worth of material. The course is taught in English and is free of charge. Upon completion of the course, you can receive an e-certificate from LinkedIn Learning. Apache Spark Essential Training: Big Data Engineering is taught by Kumaran Ponnambalam.

Overview
  • Learn how to make Apache Spark work with other Big Data technologies and put together an end-to-end project that can solve a real-world business problem.

Syllabus
  • Introduction

    • Driving big data engineering with Apache Spark
    • Course prerequisites
    • Setting up the exercise files
    1. Data Engineering Concepts
    • What is data engineering?
    • Data engineering vs. data analytics vs. data science
    • Data engineering functions
    • Batch vs. real-time processing
    • Data engineering with Spark
    2. Spark Capabilities for ETL
    • Spark architecture review
    • Parallel processing with Spark
    • Spark execution plan
    • Stateful stream processing
    • Spark analytics and ML
    3. Batch Processing Pipelines
    • Batch processing use case: Problem statement
    • Batch processing use case: Design
    • Setting up the local DB
    • Uploading stock to a central store
    • Aggregating stock across warehouses
    4. Real-Time Processing Pipelines
    • Real-time use case: Problem
    • Real-time use case: Design
    • Generating a visits data stream
    • Building a website analytics job
    • Executing the real-time pipeline
    5. Data Engineering with Spark: Best Practices
    • Batch vs. real-time options
    • Scaling extraction and loading operations
    • Scaling processing operations
    • Building resiliency
    6. End-to-End Exercise Project
    • Project exercise requirements
    • Solution design
    • Extracting long last actions
    • Building a scorecard
    Conclusion
    • More about Apache Spark