Data Engineering, Serverless ETL & BI on Amazon Cloud

Go to class
Write Review

Data Engineering, Serverless ETL & BI on Amazon Cloud provided by Udemy is a comprehensive online course, which lasts for 7 hours worth of material. Data Engineering, Serverless ETL & BI on Amazon Cloud is taught by Siddharth Raghunath. Upon completion of the course, you can receive an e-certificate from Udemy. The course is taught in Englishand is Paid Course. Visit the course page at Udemy for detailed price information.

Overview
  • Data warehousing & ETL on AWS Cloud

    What you'll learn:

    • Setting up a Data Warehouse on Amazon Cloud using Redshift from scratch
    • Learn and understand AWS Athena and when to make use of Athena
    • Learn how to store data in S3 Data lakes using Parquet columnar file formats and optimize the process of data scans using Athena
    • Learn and automate the ETL processes using different server-less components like AWS Glue , Data Pipeline and Lambda Functions
    • Data Centralization using Redshift Spectrum
    • Trigger and Automate Glue jobs using Lambda Functions
    • Understand how to pull data into QuickSight which is a BI-Reporting/Visualization offering from AWS

    AWS Cloud can seem intimidating and overwhelming to a lot of people due to its vast ecosystem, but this course will make it easier for anyone who wants a hands-on expertise in setting up a data-warehouse in Redshift or setup a BI infrastructure from scratch .

    Data Scientists/Analysts/Business Analysts will soon be expected to (if not already) become all-rounders and handle the technical aspect of data ingestion/engineering/warehousing .

    Anyone who has the basic understanding of how cloud works can benefit from this course because :

    - This course is designed keeping in mind end to end life cycle of a typical data engineering project

    - Provides a practical solution to real-world use-cases

    This Course covers :

    • Setting up a data warehouse in AWSRedshift from scratch

    • Basic Data Warehousing Concepts

    • Writing server-less AWSGlue Jobs (pyspark and python shell) for ETLand batch processing

    • AWSAthena for ad-hoc analysis (when to use Athena)

    • AWSData Pipeline to sync incremental data

    • Lambda functions to trigger and automate ETL/Data Syncing processes

    • QuickSight Setup , Analyses and Dashboards

    Prerequisites for this course are :

    • Python / Sql (Absolute must)

    • PySpark (should know how to write some basic Pyspark scripts)

    • Willingness to explore ,learn and put in the extra effort to succeed

    • An active AWSAccount

    Important Note - This course makes use of the free tiers for Redshift and RDS , so you will not be billed for them unless you exceed the free tier usage which should be more than enough to get enough practice from this course .

    Also , this course makes use of AWS UIon the browser for creating clusters and setting up jobs , there is no bash scripting involved. One can use any operating system to perform the lab sessions in this course .

    This course is not code-intense or code-heavy ,there is only 35% coding involved , the rest is execution,understanding and chaining different component together. The whole purpose of this course is to make everyone aware of and feel comfortable with all the tools/features used in this course .

    Some Tips :

    • Try to watch the videos at 1.2X speed

    • Every time you work on a new component or feature , do some research on the other tools that are meant for the same purpose and see how they differ and in what aspects , For Eg Redshift/Athena vs Snowflake or Bigquery , QuickSight vs PowerBi vs Microstrategy