Data Science on Google Cloud Platform: Building Data Pipelines

Go to class
Write Review

Free Online Course: Data Science on Google Cloud Platform: Building Data Pipelines provided by LinkedIn Learning is a comprehensive online course, which lasts for 1-2 hours worth of material. The course is taught in English and is free of charge. Upon completion of the course, you can receive an e-certificate from LinkedIn Learning. Data Science on Google Cloud Platform: Building Data Pipelines is taught by Kumaran Ponnambalam.

Overview
  • Learn how to design and build big data pipelines on Google Cloud Platform.

Syllabus
  • Introduction

    • What goes into a data pipeline?
    • Data science modules covered
    1. GCP Data Pipeline Products
    • GCP data pipeline options
    • Cloud Dataproc
    • Cloud Dataflow
    • Cloud Pub/Sub
    2. Apache Beam
    • What is Apache Beam?
    • Beam pipelines
    • PCollections
    • Transforms
    • Pipeline I/O
    • Runners
    3. Setting Up Dataflow
    • Setting up GCP for Dataflow
    • Setting up Python
    • Creating a simple pipeline
    • Executing in Dataflow
    4. Data Processing with Beam and Dataflow
    • Reading text files
    • ParDo
    • GroupBy
    • Map
    • Combine
    • Writing data to text files
    • Other capabilities
    5. Cloud Pub/Sub
    • What is Pub/Sub?
    • Topics and messages
    • Publishers
    • Subscribers
    • Create a topic
    • Create a subscription
    • Publish and receive
    • Python SDK
    6. Streaming with Dataflow
    • Streaming with Dataflow
    • Windowing with Dataflow
    • Streaming and windowing example
    Conclusion
    • Next steps