Skip to content

toporek3112/etl_pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ETL Pipeline

Introduction

The ETL Pipeline project aims to provide an easily configurable and plug-and-play solution for running and integrating new datasets into an Extract, Transform, Load (ETL) process. It simplifies the process of setting up a robust ETL pipeline by utilizing a run_etl.py script, which ensures the base architecture and any required Docker containers are efficiently managed and operational.

Prerequisites

  • docker
  • docker-compose
  • python 3.6 or later

Architecture

The main idea is to provide a base arhictecture which consists of the following containers:

  • Grafana: For visualization
  • Postgres: For persisting the data
  • Kafka: For near-realtime event processing

These containers build the base architecture which is required for the datasets avalible in this repository. Kafka is an optional service which is not really required for this small scope but it is nice to have since it is used in many real world use-cases.

alt text

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published