Trimming outliers using trees: Winning solution of the Large-scale Energy Anomaly Detection (LEAD) competition

1st winning solution in Large-scale Energy Anomaly Detection (LEAD) competition

In this repository, you can find script notebooks of the winning solution in notebooks folder, presentation slides,and the paper detailing the modeling framework.

Building meta data and weather data
Temporal features (e.g., hour, weekday, and day of year)
Target encoding features (ref: preprocessing script from 1st place team in GEPIII)
Value-change features: calculate change of value compared to nearby values (e.g., X(t)-X(t-1) and X(t)/X(t-1)) with varying shift steps (from 1 hour to 168 hours))
Features from data smoothing and k-means clustering were also tried, but they don’t appear to significantly improve the score

Train/valid split by building_id to ensure the valid data were unseen during training
Downsampling training dataset to solve data imbalance (~5% of anomalies)
Model ensembling via simple averaging: XGBosst, LightGBM, CatBoost, and HistGradientBoosting (weight of 0.25 for each)

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
notebooks		notebooks
README.md		README.md
Trimming outliers using trees (paper).pdf		Trimming outliers using trees (paper).pdf
Trimming outliers using trees (slides).pdf		Trimming outliers using trees (slides).pdf