Using SHAP on unseen data to understand model's predictions #2571

ETTAN93 · 2024-10-24T15:13:05Z

Assuming I have a model that is initialized as such:

model_estimator = LightGBMModel(
    lags=None,
    lags_past_covariates=[-3,-2,-1].
    lags_future_covariates=[-3,-2,-1],
    output_chunk_length=3
)

Question 1:
I then create the ShapExplainer object by fitting it to the training set.

shap_explain = ShapExplainer(model_estimator)
explanations = shap_explain.summary_plot()

Assuming now I want to use the ShapExplainer to explain data in the unseen test set, what should be defined as the foreground series?
I tried not providing the foreground series since target lags is 0 but it seems like that is not possible. In that case, should the foreground series, past_covariates and future covariates be the same series passed to the model.predict function, i.e. the foreground_series should end at the first prediction timestamp t=0?

shap_explain.explain(
    foreground_series = hf_data_dict['target_hf'][: test_set_start_date],
    foreground_past_covariates = hf_data_dict['past_cov_hf'][test_set_start_date: end_date],
    foreground_future_covariates = hf_data_dict['future_cov_hf'][test_set_start_date: end_date],
    horizons = [3]
)

Or should the foreground_series start from the test_set_start_date?

shap_explain.explain(
    foreground_series = hf_data_dict['target_hf'][test_set_start_date: ],
    foreground_past_covariates = hf_data_dict['past_cov_hf'][test_set_start_date: end_date],
    foreground_future_covariates = hf_data_dict['future_cov_hf'][test_set_start_date: end_date],
    horizons = [3]
)

Question 2:
I tried training the SHAP explainer on different background series (train or test set) but I am getting back the exact same SHAP results. For example

split_date = pd.to_datetime('2023-02-28 23:59:00')
test_set_start_date = pd.to_datetime('2023-03-01 00:00:00')

#shap explainer trained on train set but used to explain test set
shap_explainer = ShapExplainer(lgbm_model)

df1 = shap_explainer.explain(
    foreground_series = hf_data_dict['target_hf'][: split_date],
    foreground_past_covariates = hf_data_dict['past_cov_hf'][test_set_start_date: ],
    foreground_future_covariates = hf_data_dict['future_cov_hf'][test_set_start_date: ],
    horizons = [3]
)

#shap explainer trained on test set and used to explain test set
shap_explainer_test = ShapExplainer(
    lgbm_model,
    background_series = hf_data_dict['target_hf'][: split_date],
    background_past_covariates = hf_data_dict['past_cov_hf'][test_set_start_date: ],
    background_future_covariates = local_lgbm_hf_output.hf_data_dict['future_cov_hf'][test_set_start_date: ]
)

df2 = shap_explainer_test.explain(
    foreground_series = hf_data_dict['target_hf'][:split_date],
    foreground_past_covariates = hf_data_dict['past_cov_hf'][test_set_start_date: ],
    foreground_future_covariates = hf_data_dict['future_cov_hf'][test_set_start_date: ],
    horizons = [3]
)

#shap explainer trained on test set but changing foreground series to same as past and future cov
shap_explainer_test = ShapExplainer(
    lgbm_model,
    background_series = hf_data_dict['target_hf'][test_set_start_date: ],
    background_past_covariates = hf_data_dict['past_cov_hf'][test_set_start_date: ],
    background_future_covariates = local_lgbm_hf_output.hf_data_dict['future_cov_hf'][test_set_start_date: ]
)

df3 = shap_explainer_test.explain(
    foreground_series = hf_data_dict['target_hf'][:split_date],
    foreground_past_covariates = hf_data_dict['past_cov_hf'][test_set_start_date: ],
    foreground_future_covariates = hf_data_dict['future_cov_hf'][test_set_start_date: ],
    horizons = [3]
)

It seems like all 3 dataframes return exactly the same value which is very weird to me as the ShapExplainer is trained with a different background series. Also when looking at the base_values, they are also exactly the same. Is there a bug somewhere in the implementation or am I using the function wrongly?

The text was updated successfully, but these errors were encountered:

ETTAN93 added question Further information is requested triage Issue waiting for triaging labels Oct 24, 2024

ETTAN93 changed the title ~~Using SHAPs on unseen data to understand model's predictions~~ Using SHAP on unseen data to understand model's predictions Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using SHAP on unseen data to understand model's predictions #2571

Using SHAP on unseen data to understand model's predictions #2571

ETTAN93 commented Oct 24, 2024 •

edited

Loading

Using SHAP on unseen data to understand model's predictions #2571

Using SHAP on unseen data to understand model's predictions #2571

Comments

ETTAN93 commented Oct 24, 2024 • edited Loading

ETTAN93 commented Oct 24, 2024 •

edited

Loading