https://world-energy-stats.fly.dev
Significant shifts in global energy dynamics over the past 50 years, driven by technology advancements, emerging energy sources, and growing climate awareness, highlight the need for understanding and analyzing changes in energy consumption.
Using Big Data tools, this project analyzes consumption trends for primary energy sources on global/country level over the past few decades and generate insights.
Final Project for MDS @ TMU Course - DS8003.
Kartikey Chauhan 🔣💻 |
Ruchi 🔣 💻 |
This project uses the awesome data from OWID.
Hannah Ritchie, Max Roser and Pablo Rosado (2023) - “Energy” Published online at OurWorldInData.org. Retrieved from: 'https://ourworldindata.org/energy' [Online Resource]
docker compose up airflow-init
docker ps --format "table {{.ID}}\t{{.Names}}\t{{.Status}}"
docker compose down --volumes --remove-orphans
docker exec spark-notebook jupyter server list
docker exec -it hive-server hive
hdfs dfs -ls -R /app-logs
App | Server | Link |
---|---|---|
Hadoop | ResourceManager UI | http://localhost:8088/cluster |
Hadoop | Namenode UI | http://localhost:9870/ |
Hadoop | NodeManager UI | http://localhost:8842 |
Jupyter | Notebook UI | http://localhost:8888/ |
Airflow | Web Server UI | http://localhost:8082/ |
Spark | Master | http://localhost:8080/ |
Spark | Worker 1 | http://localhost:8081/ |
Spark | Worker 2 | http://localhost:8083/ |
.
├── Procfile
├── README.md
├── airflow
│ ├── dags
│ │ ├── hadoop_setup_python.py
│ │ ├── hdfs_data_download.py
│ │ ├── hdfs_data_upload.py
│ │ ├── hive_create_database.py
│ │ ├── run_all.py
│ │ ├── run_data_categorization.py
│ │ ├── run_data_transformation.py
│ │ ├── run_hive_sql.py
│ │ └── run_mapred_genstats.py
├── app.py
├── assets
│ ├── big-players.png
│ ├── data
│ │ ├── 1_energy_overview.csv
│ │ ├── 2_energy_consumption_pct_rem.csv
│ │ ├── 2_energy_consumption_pct_top15.csv
│ │ ├── 2_energy_consumption_top15.csv
│ │ ├── 3_energy_breakdown_top15.csv
│ │ ├── 4_electricity_gen_top15.csv
│ │ ├── 4_electricity_share_top15.csv
│ │ ├── 5_population_correlation.csv
│ │ └── energy_share.csv
│ ├── electricity-mix.png
│ ├── energy-consumption.png
│ ├── energy-gdp-pop.png
│ ├── energy-mix.png
│ └── styles.css
├── components
│ ├── insight_1.py
│ ├── insight_2.py
│ ├── insight_3.py
│ ├── insight_4.py
│ └── insight_5.py
├── docker-compose.env
├── docker-compose.yml
├── energy-data
│ ├── README.md
│ ├── owid-energy-codebook.csv
│ └── owid-energy-data.csv
├── fly.toml
├── notebooks
│ ├── eda.ipynb
│ ├── hive_queries_ak.ipynb
│ ├── hive_queries_kc.ipynb
│ ├── output
│ ├── spark_etl_countries.ipynb
│ ├── spark_etl_world.ipynb
│ ├── sql
│ └── utils.py
├── process.png
├── requirements.txt
├── scripts
│ ├── hadoop
│ │ ├── eda_pandas_mapper.py
│ │ ├── eda_pandas_mapper_orig.py
│ │ ├── eda_pandas_reducer.py
│ │ ├── eda_pandas_reducer_orig.py
│ │ ├── null_percent_mapper.py
│ │ ├── null_percent_reducer.py
│ │ └── test
│ ├── install_python.sh
│ └── pyspark
│ ├── data_categorization.py
│ ├── data_transformation.py
│ ├── run_hive_query.py
│ └── utils.py
├── setup.sh
└── sql
├── 1_energy_overview.sql
├── 2_energy_consumption_pct_rem.sql
├── 2_energy_consumption_pct_top15.sql
├── 2_energy_consumption_top15.sql
├── 3_energy_breakdown_top15.sql
├── 4_electricity_gen_top15.sql
├── 4_electricity_share_top15.sql
├── 5_population_correlation.sql
├── combined_energy_data.sql
└── energy_share.sql