Real Time Spark Project for Beginners: Hadoop, Spark, Docker

Go to class
Write Review

Real Time Spark Project for Beginners: Hadoop, Spark, Docker provided by Udemy is a comprehensive online course, which lasts for 7 hours worth of material. Real Time Spark Project for Beginners: Hadoop, Spark, Docker is taught by PARI MARGU. Upon completion of the course, you can receive an e-certificate from Udemy. The course is taught in Englishand is Paid Course. Visit the course page at Udemy for detailed price information.

Overview
  • Building Real Time Data Pipeline Using Apache Kafka, Apache Spark, Hadoop, PostgreSQL, Django and Flexmonster on Docker

    What you'll learn:

    • Complete Development of Real Time Streaming Data Pipeline using Hadoop and Spark Cluster on Docker
    • Setting up Single Node Hadoop and Spark Cluster on Docker
    • Features of Spark Structured Streaming using Spark with Scala
    • Features of Spark Structured Streaming using Spark with Python(PySpark)
    • How to use PostgreSQL with Spark Structured Streaming
    • Basic understanding of Apache Kafka
    • How to build Data Visualisation using Django Web Framework and Flexmonster
    • Fundamentals of Docker and Containerization

    • In many data centers, different type of servers generate large amount of data(events, Event in this case is status of the server in the data center) in real-time.

    • There is always a need to process these data in real-time and generate insights which will be used by the server/data center monitoring people and they have to track these server's status regularly and find the resolution in case of issues occurring, for better server stability.

    • Since the data is huge and coming in real-time, we need to choose the right architecture with scalable storage and computation frameworks/technologies.

    • Hence we want to build the Real Time Data Pipeline Using Apache Kafka, Apache Spark, Hadoop, PostgreSQL, Django and Flexmonster on Docker to generate insights out of this data.

    • The Spark Project/Data Pipeline is built using Apache Spark with Scala and PySpark on Apache Hadoop Cluster which is on top of Docker.

    • Data Visualization is built using Django Web Framework and Flexmonster.