Skip to content

Latest commit

 

History

History
118 lines (77 loc) · 4.89 KB

File metadata and controls

118 lines (77 loc) · 4.89 KB

Week 2: Workflow Orchestration

If you're looking for Airflow videos from the 2022 edition, check the 2022 cohort folder.

Python code from videos is linked below.

Also, if you find the commands too small to view in Kalise's videos, here's the transcript with code for the second Prefect video and the fifth Prefect video.

Data Lake (GCS)

  • What is a Data Lake
  • ELT vs. ETL
  • Alternatives to components (S3/HDFS, Redshift, Snowflake etc.)
  • Video
  • Slides

1. Introduction to Workflow orchestration

  • What is orchestration?
  • Workflow orchestrators vs. other types of orchestrators
  • Core features of a workflow orchestration tool
  • Different types of workflow orchestration tools that currently exist

🎥 Video

2. Introduction to Prefect concepts

  • What is Prefect?
  • Installing Prefect
  • Prefect flow
  • Creating an ETL
  • Prefect task
  • Blocks and collections
  • Orion UI

🎥 Video

3. ETL with GCP & Prefect

  • Flow 1: Putting data to Google Cloud Storage

🎥 Video

4. From Google Cloud Storage to Big Query

  • Flow 2: From GCS to BigQuery

🎥 Video

5. Parametrizing Flow & Deployments

  • Parametrizing the script from your flow
  • Parameter validation with Pydantic
  • Creating a deployment locally
  • Setting up Prefect Agent
  • Running the flow
  • Notifications

🎥 Video

6. Schedules & Docker Storage with Infrastructure

  • Scheduling a deployment
  • Flow code storage
  • Running tasks in Docker

🎥 Video

7. Prefect Cloud and Additional Resources

  • Using Prefect Cloud instead of local Prefect
  • Workspaces
  • Running flows on GCP

🎥 Video

Code repository

Code from videos (with a few minor enhancements)

Homework

Homework can be found here.

Community notes

Did you take notes? You can share them here.

2022 notes

Most of these notes are about Airflow, but you might find them useful.