This repository contains code for a comprehensive data analysis and machine learning project focused on smart grid inspection data provided by SGCC. The main highlights of the project include:
Data Acquisition: We obtained labeled data from SGCC, a trusted source, which had been inspected and labeled by their experts.
Data Preprocessing: We applied various data preprocessing techniques to clean and prepare the data for analysis, ensuring it was in the right format for further processing and fed into the data-driven model.
Feature Engineering: We crafted and engineered new features from the dataset to enhance the predictive power of the model, provide better insights into useful features, and add robustness to the model.
Class Balancing: To address potential class imbalance issues, we balanced the dataset, ensuring that the model learned effectively from both classes and reduced the bias issue.
Model Implementation and Tuning: We implemented the Histogram Gradient Boosting Classifier to accurately predict the target based on the processed data.
The model is improved by adding emphasis on hyperparameter tuning for optimal results.
Performance Metrics: A comprehensive set of performance metrics was derived to evaluate the model's effectiveness in predicting abnormal class.
This project serves as a practical example of data analysis and machine learning for improving grid operation and can be a valuable resource for similar applications.