Power Plant Machine Learning Pipeline Application -EdX - Lab1- Big Data Analysis with Apache Spark
This notebook is an end-to-end exercise of performing Extract-Transform-Load and Exploratory Data Analysis on a real-world dataset, and then applying several different machine learning algorithms to solve a supervised regression problem on the dataset.
** This notebook covers: **
- Part 1: Business Understanding
- Part 2: Load Your Data
- Part 3: Explore Your Data
- Part 4: Visualize Your Data
- Part 5: Data Preparation
- Part 6: Data Modeling
- Part 7: Tuning and Evaluation
Our goal is to accurately predict power output given a set of environmental readings from various sensors in a natural gas-fired power generation plant.