Simple Serverless ETL with AWS

TL;DR

That's a simple ETL pipeline using EventBridge, Lambdas and Glue over S3. Including CI with Github Actions and integration tests using LocalStack.

ETL Pipeline

Extract

Lambda Function

Extracts JSON data from an API and dumps it into an S3 bucket.
writes files based on size and time

Raw Bucket

The Lambda function will put the data into a bucket with the following structure:

Expected file paths: ${year}/${month}/${day}/${hour}/${epoch_timestamp}.json
Default retention is 1 day.

Transform

Controlled by EventBridge - triggered when the lambda function finished

Glue Job Responsibilities:

Processing: Filter and modify data according to business logic.
Output: Create Parquet files and store them in a new bucket.
State Management: Update the state file with the latest timestamp.
Error Handling: Implement an at-least-once strategy.
Post-Processing: Trigger another Lambda function to process the Parquet files.

Load

Lambda Function:

Triggered by the Glue job.
Loads the Parquet files into an AWS Redshift data warehouse.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.infra		.infra
docker		docker
go		go
python		python
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
lambda_status.sh		lambda_status.sh
simple_etl_aws.iml		simple_etl_aws.iml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simple Serverless ETL with AWS

TL;DR

ETL Pipeline

Extract

Lambda Function

Raw Bucket

Transform

Load

Lambda Function:

About

Releases

Packages

Languages

ororlevy/simple_etl_aws

Folders and files

Latest commit

History

Repository files navigation

Simple Serverless ETL with AWS

TL;DR

ETL Pipeline

Extract

Lambda Function

Raw Bucket

Transform

Load

Lambda Function:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages