This project aims to compare different Explainable AI (XAI) methods to assess how well they capture and explain feature interactions in machine learning models. The goal is to better understand the influence of individual features and their interactions in model decision-making.
The project implements the Accumulated Local Effects (ALE), H-statistic, and SHAP interaction methods, with plans to include other methods in the future.
Currently, the project includes:
- Friedman #1 dataset generation for non-linear feature interaction simulation, with plans to implement the integration of another dataset that reflects a real-life scenario.
- Bike Sharing dataset generation. The data was made openly available by Capital-Bikeshare. Fanaee-T and Gama (2013) added weather data and seasonal information. The aim is to predict how many bikes will be rented depending on the weather and day.
- Training of two machine learning models: XGBoost and Random Forest.
- Generation of explanations for the different methods.
- Clone repository:
git clone https://github.com/PabloSczn/feature-interaction-xai.git
cd feature-interaction-xai
- Create a Python virtual environment:
python -m venv venv
venv\Scripts\activate # On Unix systems: source venv/bin/activate
- Install dependencies:
pip install -r requirements.txt
-
Generate the dataset:
- Generate the Friedman #1 dataset and save it in
data/
:
python generate-data/generate-friedman-1.py
- Process the Bike Sharing dataset and save it in
data/
:
python generate-data/generate-bike-sharing-data.py
Data source for Bike Sharing: Fanaee-T, H. (2013). Bike Sharing Dataset. UCI Machine Learning Repository.
- Generate the Friedman #1 dataset and save it in
-
Train models: Train the XGBoost and Random Forest models and save them in the
models/
directory:python train-models.py
This saves the models trained for both Bike Sharing and Friedman 1 datasets.
-
Generate explanations: Generate the explanations for the different methods:
python methods/ale.py python methods/h-statistic.py python methods/shap-interaction.py
- To generate the explanations for both dataset, you need to change
DATASET_NAME
to eitherfriedman1
orbike-sharing
- The generated explanations will be saved for later comparison in
explanations/
.
- To generate the explanations for both dataset, you need to change
-
Compare explanations: After generating the explanations, you can run the script to normalise and compare the explanations:
python compare-methods.py
- You need to change
DATASET_NAME
to eitherfriedman1
orbike-sharing
- The generated comparisons will be saved in
comparison_results/
.
- You need to change
Pablo Sanchez Narro
Contact: [email protected]