The Ultimate Hands-On Hadoop: Tame your Big Data!

Go to class
Write Review

Free Online Course: The Ultimate Hands-On Hadoop: Tame your Big Data! provided by Skillshare is a comprehensive online course, which lasts for 13 hours worth of material. The course is taught in English and is free of charge. The Ultimate Hands-On Hadoop: Tame your Big Data! is taught by Frank Kane.

Overview
  • Learn and master the most popular big data technologies in this comprehensive course, taught by a former engineer and senior manager from Amazon and IMDb. We'll go way beyond Hadoop itself, and dive into all sorts of distributed systems you may need to integrate with.

    • Install and work with a real Hadoop installation right on your desktop with Hortonworks and the Ambari UI
    • Manage big data on a cluster with HDFS and MapReduce
    • Write programs to analyze data on Hadoop with Pig and Spark
    • Store and query your data with SqoopHiveMySQLHBaseCassandraMongoDBDrillPhoenix, and Presto
    • Design real-world systems using the Hadoop ecosystem
    • Learn how your cluster is managed with YARNMesosZookeeperOozieZeppelin, and Hue
    • Handle streaming data in real time with KafkaFlumeSpark StreamingFlink, and Storm

Syllabus
    • Introduction
    • Install Hadoop on your Desktop
    • Hadoop Overview and History
    • Overview of the Hadoop Ecosystem
    • HDFS: What it is, and how it works
    • [Activity] Install the MovieLens dataset into HDFS using the Ambari UI
    • [Activity] Install the MovieLens dataset into HDFS using the command line
    • MapReduce: What it is, and how it works
    • How MapReduce distributes processing
    • MapReduce example: Break down movie ratings by rating score
    • [Activity] Installing Python, MRJob, and nano
    • [Activity] Code up the ratings histogram MapReduce job and run it
    • [Exercise] Rank movies by their popularity
    • [Activity] Check your results against mine!
    • Introducing Ambari
    • Introducing Pig
    • Example: Find the oldest movie with a 5-star rating using Pig
    • [Activity] Find old 5-star movies with Pig
    • More Pig Latin
    • [Exercise] Find the most-rated one-star movie
    • Pig Challenge: Compare Your Results to Mine!
    • Why Spark?
    • The Resilient Distributed Dataset (RDD)
    • [Activity] Find the movie with the lowest average rating - with RDD's
    • Datasets and Spark 2.0
    • [Activity] Find the movie with the lowest average rating - with DataFrames
    • [Activity] Movie recommendations with MLLib
    • [Exercise] Filter the lowest-rated movies by number of ratings
    • [Activity] Check your results against mine!
    • What is Hive?
    • [Activity] Use Hive to find the most popular movie[Activity] Use Hive to find the most popular movie
    • How Hive works
    • [Exercise] Use Hive to find the movie with the highest average rating
    • Compare your solution to mine.
    • Integrating MySQL with Hadoop
    • [Activity] Install MySQL and import our movie data
    • [Activity] Use Sqoop to import data from MySQL to HFDS/Hive
    • [Activity] Use Sqoop to export data from Hadoop to MySQL
    • Why NoSQL?
    • What is HBase
    • [Activity] Import movie ratings into HBase
    • [Activity] Use HBase with Pig to import data at scale.
    • Cassandra overview
    • [Activity] Installing Cassandra
    • [Activity] Write Spark output into Cassandra
    • MongoDB Overview
    • [Activity] Install MongoDB, and integrate Spark with MongoDB
    • [Activity] Using the MongoDB shell
    • Choosing a database technology
    • [Exercise] Choose a database for a given problem
    • Overview of Drill
    • [Activity] Setting Up Drill
    • [Activity] Querying across multiple databases with Drill
    • Overview of Phoenix
    • [Activity] Install Phoenix and query HBase with it
    • [Activity] Integrate Phoenix with Pig
    • Overview of Presto
    • [Activity] Install Presto, and query Hive with it.
    • [Activity] Query both Cassandra and Hive using Presto.
    • YARN explained
    • Tez explained
    • [Activity] Use Hive on Tez and measure the performance benefit
    • Mesos explained
    • ZooKeeper explained
    • [Activity] Simulating a failing master with ZooKeeper
    • Oozie explained
    • [Activity] Set up a simple Oozie workflow
    • Zeppelin overview
    • [Activity] Use Zeppelin to analyze movie ratings, part 1
    • [Activity] Use Zeppelin to analyze movie ratings, part 2
    • Hue overview
    • Other technologies worth mentioning
    • Kafka explained
    • [Activity] Setting up Kafka, and publishing some data.
    • [Activity] Publishing web logs with Kafka
    • Flume explained
    • [Activity] Set Up Flume and publish logs with Spark
    • [Activity] Set up Flume to monitor a directory and store its data in HDFS
    • Spark Streaming: Introduction
    • [Activity] Analyze web logs published with Flume using Spark Streaming
    • [Exercise] Monitor Flume-published logs for errors in real time
    • Exercise solution: Aggregating HTTP access codes with Spark Streaming
    • Apache Storm: Introduction
    • [Activity] Count words with Storm
    • Flink: An Overview
    • [Activity] Counting words with Flink
    • The Best of the Rest
    • Review: How the pieces fit together
    • Understanding your requirements
    • Sample application: consume webserver logs and keep track of top-sellers
    • Sample application: serving movie recommendations to a website
    • [Exercise] Design a system to report web sessions per day
    • Exercise solution: Design a system to count daily sessions