This repository contains code to reproduce results in our NeurIPS 2022 publication "Which Explanation Should I Choose? A Function Approximation Perspective to Characterizing Post Hoc Explanations".
Under the local function approximation (LFA) framework, explanations perform local function approximation of a complex model over a local neighbourhood using a simple model based on a loss function. The LFA framework unifies eight diverse popular post hoc explanation methods (i.e., LIME, C-LIME, KernelSHAP, Occlusion, Vanilla Gradients, SmoothGrad, Gradient x Input, and Integrated Gradients). Using the LFA framework, we show that no single explanation method can perform optimally over every local neighbourhood, calling for a principle approach to select among methods. To select among methods, we set forth a guiding principle, deeming a method to be effective if it performs faithful LFA. Using the LFA framework, we determine the conditions under which each existing explanation methods are effective. If, in a given situation, no existing method is effective, the LFA framework also provides a way to design novel methods (by specifying an appropriate model class, local neighborhood, and loss function) that are tailored to the given situation and that satisfy the guiding principle.
To reproduce results, navigate into the repository follow the steps below.
1. Generate explanations for individual model predictions
- Run
$ python experiments/generate_explanations.py
. - Explanations are generated to explain the individual model predictions of each model (four regression models and four classification models) using each explanation method (LIME, KernelSHAP, Occlusion, Vanilla Gradients, Gradient x Input, SmoothGrad, and Integrated Gradients). Each explanation is computed using two approaches: the existing approach (implemented by the Captum library) and the LFA framework (implemented in the
lfa
folder). - Explanations use 1000 perturbations per data point. If running on a local machine, use a smaller number of perturbations by changing line 165 (
n_perturbs_list = [1000]
) inexperiments/generate_explanations.py
- Explanations are saved in
experiments/results
.
2. Analyze explanations
- Run
$ python analysis/analyze_explanations.py
. - Figures are saved in
analysis/figures
. These are the figures that appear in the paper.
@inproceedings{lfa2022,
title={Which Explanation Should I Choose? A Function Approximation Perspective to Characterizing Post Hoc Explanations},
author={Han, Tessa, and Srinivas, Suraj, and Lakkaraju, Himabindu},
booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
year={2022}
}