Learning Hadoop

Go to class
Write Review

Free Online Course: Learning Hadoop provided by LinkedIn Learning is a comprehensive online course, which lasts for 4-5 hours worth of material. The course is taught in English and is free of charge. Upon completion of the course, you can receive an e-certificate from LinkedIn Learning. Learning Hadoop is taught by Lynn Langit.

Overview
  • Learn about Hadoop, key file systems used with Hadoop, its processing engine—MapReduce—and its many libraries and programming tools.

    Hadoop is indispensable when it comes to processing big data—as necessary to understanding your information as servers are to storing it. This course is your introduction to Hadoop; key file systems used with Hadoop; its processing engine, MapReduce, and its many libraries and programming tools. Developer and big-data consultant Lynn Langit shows how to set up a Hadoop development environment, run and optimize MapReduce jobs, code basic queries with Hive and Pig, and build workflows to schedule jobs. Plus, learn about the depth and breadth of available Apache Spark libraries available for use with a Hadoop cluster, as well as options for running machine learning jobs on a Hadoop cluster.

Syllabus
  • Introduction

    • Getting started with Hadoop
    • What you should know before watching this course
    • Using cloud services
    1. Why Change?
    • Limits of relational database management systems
    • Introducing CAP (consistency, availability, partitioning)
    • Understanding big data
    2. What Is Hadoop?
    • Introducing Hadoop
    • Understanding Hadoop distributions
    • Understanding the difference between HBase and Hadoop
    • Exploring the future of Hadoop
    3. Understanding Hadoop Core Components
    • Understanding Java Virtual Machines (JVMs)
    • Exploring HDFS and other file systems
    • Introducing Hadoop cluster components
    • Introducing Hadoop Spark
    • Exploring the Apache and Cloudera Hadoop distributions
    • Using the public cloud to host Hadoop: AWS or GCP
    4. Setting up Hadoop Development Environment
    • Understanding the parts and pieces
    • Hosting Hadoop locally with the Cloudera developer distribution
    • Setting up the Cloudera Hadoop developer virtual machine
    • Adding Hadoop libraries to your test environment
    • Picking your programming language and IDE
    • Use GCP Dataproc for development
    5. Understanding MapReduce 1.0
    • Understanding MapReduce 1.0
    • Exploring the components of a MapReduce job
    • Working with the Hadoop file system
    • Running a MapReduce job using the console
    • Reviewing the code for a MapReduce WordCount job
    • Running and tracking Hadoop jobs
    6. Tuning MapReduce
    • Tuning by physical methods
    • Tuning a Mapper
    • Tuning a Reducer
    • Using a cache for lookups
    7. Understanding MapReduce 2.0/YARN
    • Understanding MapReduce 2.0
    • Coding a basic WordCount in Java using MapReduce 2.0
    • Exploring advanced WordCount in Java using MapReduce 2.0
    8. Understanding Hive
    • Introducing Hive and HBase
    • Understanding Hive
    • Revisiting WordCount using Hive
    • Understanding more about HQL query optimization
    • Using Hive in GCP Dataproc
    9. Understanding Pig
    • Introducing Pig
    • Understanding Pig
    • Exploring use cases for Pig
    • Exploring Pig tools in GCP Dataproc
    10. Understanding Workflows and Connectors
    • Introducing Oozie
    • Building a workflow with Oozie
    • Introducing Sqoop
    • Importing data with Sqoop
    • Introducing ZooKeeper
    • Coordinating workflows with ZooKeeper
    11. Using Spark
    • Introducing Apache Spark
    • Running a Spark job to calculate Pi
    • Running a Spark job in a Jupyter Notebook
    12. Hadoop Today
    • Understanding machine learning options
    • Understanding data lakes
    • Visualizing Hadoop systems
    Next Steps
    • Next steps with Hadoop