Skip to content

has-ctrl/flights-etl-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Engineering Pipeline: Schiphol Flights

About The Project

This data engineering case was made for Digital Power. It is an ETL pipeline that gives insight in the estimated number of passengers on board of flights arriving at - and departing from Schiphol Airport. It can be run both locally as well as in the cloud using AWS EC2 and Apache Airflow.

  • It extracts json data from the Schiphol API, as well as .csv files with information on aircraft capacity and passenger load factors.
  • It transforms the extracted data using Pandas into a dataframe.
  • It loads the dataframe as a .csv file into AWS S3.

Getting Started

You can either run the project locally or in the cloud. I will give instructions for both.

Local Installation

  1. Install Python 3.10.4 and create a virtual environment.

  2. Clone the repository.

    git clone https://github.com/has-ctrl/flights-etl-pipeline.git
  3. Install the dependencies using pip.

    pip install -r requirements.txt
  4. Get Schiphol Flight API Keys and enter keys in .secret/api_creds.json.

  5. Create AWS account and enter API keys and S3 bucket_url in .secret/aws_creds.json.

  6. Run the test.py script and view resulting .csv file in your S3 bucket.

    python test.py

Cloud Installation

  1. Create AWS account, launch an EC2 instance (Ubuntu OS with at least a 4GB RAM t3.medium instance), and save the key-pair .pem file in working directory.

  2. Connect to your instance using an SSH client with the key-pair .pem file like so:

    ssh -i "KEY-PAIR.pem" [email protected]
  3. Install Python and Apache Airflow and clone the repository.

    sudo apt-get update
    sudo apt install python3-pip
    sudo pip install apache-airflow
    cd airflow/
    git clone https://github.com/has-ctrl/flights-etl-pipeline.git
  4. Install the dependencies using pip.

    pip install -r requirements.txt
  5. Get Schiphol Flight API Keys and enter keys in .secret/api_creds.json.

  6. Create AWS account and enter API keys and S3 bucket_url in .secret/aws_creds.json.

  7. Edit the airflow.cfg config file by changing the dags_folder to /home/ubuntu/airflow/flights-etl-pipeline and setting enable_xcom_pickling = True.

  8. Run Airflow and copy username and password from the logged output.

    airflow standalone
  9. Optionally trigger the scheduler manually by logging into Airflow Console using the Public IPv4 DNS in your web-browser (e.g. ec2-X-XX-XX-XX.eu-central-1.compute.amazonaws.com:8080/) after opening port 8080 on your machine.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages