Big Data Analytics with Hadoop and Apache Spark

Go to class
Write Review

Free Online Course: Big Data Analytics with Hadoop and Apache Spark provided by LinkedIn Learning is a comprehensive online course, which lasts for 1-2 hours worth of material. The course is taught in English and is free of charge. Upon completion of the course, you can receive an e-certificate from LinkedIn Learning. Big Data Analytics with Hadoop and Apache Spark is taught by Kumaran Ponnambalam.

Overview
  • Discover how to build scalable and optimized data analytics pipelines by combining the powers of Apache Hadoop and Spark.

Syllabus
  • Introduction

    • The combined power of Spark and Hadoop Distributed File System (HDFS)
    1. Introduction and Setup
    • Apache Hadoop overview
    • Apache Spark overview
    • Integrating Hadoop and Spark
    • Setting up the environment
    • Using exercise files
    2. HDFS Data Modeling for Analytics
    • Storage formats
    • Compression
    • Partitioning
    • Bucketing
    • Best practices for data storage
    3. Data Ingestion with Spark
    • Reading external files into Spark
    • Writing to HDFS
    • Parallel writes with partitioning
    • Parallel writes with bucketing
    • Best practices for ingestion
    4. Data Extraction with Spark
    • How Spark works
    • Reading HDFS files with schema
    • Reading partitioned data
    • Reading bucketed data
    • Best practices for data extraction
    5. Optimizing Spark Processing
    • Pushing down projections
    • Pushing down filters
    • Managing partitions
    • Managing shuffling
    • Improving joins
    • Storing intermediate results
    • Best practices for data processing
    6. Use Case Project
    • Problem definition
    • Data loading
    • Total score analytics
    • Average score analytics
    • Top student analytics
    Conclusion
    • Next steps