- What is a Data Lake
- ELT vs. ETL
- Alternatives to components (S3/HDFS, Redshift, Snowflake etc.)
🎥 Video
- What is an Orchestration Pipeline?
- What is a DAG?
- Video
- Setting up Airflow with Docker-Compose
- Video
- More information in the airflow folder
If you want to run a lighter version of Airflow with fewer services, check this video. It's optional.
- Extraction: Download and unpack the data
- Pre-processing: Convert this raw data to parquet
- Upload the parquet files to GCS
- Create an external table in BigQuery
- Video
- Converting the ingestion script for loading data to Postgres to Airflow DAG
- Video
Moving files from AWS to GCP.
You will need an AWS account for this. This section is optional
In the homework, you'll create a few DAGs for processing the NY Taxi data for 2019-2021
More information here
Did you take notes? You can share them here.
- Notes from Alvaro Navas
- Notes from Aaron Wright
- Notes from Abd
- Blog post by Isaac Kargar
- Blog, notes, walkthroughs by Sandy Behrens
- Add your notes here (above this line)