ETL Pipeline

Introduction

The ETL Pipeline project aims to provide an easily configurable and plug-and-play solution for running and integrating new datasets into an Extract, Transform, Load (ETL) process. It simplifies the process of setting up a robust ETL pipeline by utilizing a run_etl.py script, which ensures the base architecture and any required Docker containers are efficiently managed and operational.

Prerequisites

docker
docker-compose
python 3.6 or later

Architecture

The main idea is to provide a base arhictecture which consists of the following containers:

Grafana: For visualization
Postgres: For persisting the data
Kafka: For near-realtime event processing

These containers build the base architecture which is required for the datasets avalible in this repository. Kafka is an optional service which is not really required for this small scope but it is nice to have since it is used in many real world use-cases.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
deprecated		deprecated
diagrams		diagrams
pipeline		pipeline
.gitignore		.gitignore
LICENSE		LICENSE
ReadMe.md		ReadMe.md
commands.sh		commands.sh
environment.yml		environment.yml
run_etl.py		run_etl.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ETL Pipeline

Introduction

Prerequisites

Architecture

About

Releases

Packages

Contributors 2

Languages

License

toporek3112/etl_pipeline

Folders and files

Latest commit

History

Repository files navigation

ETL Pipeline

Introduction

Prerequisites

Architecture

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages