You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Assuming now I want to use the ShapExplainer to explain data in the unseen test set, what should be defined as the foreground series?
I tried not providing the foreground series since target lags is 0 but it seems like that is not possible. In that case, should the foreground series, past_covariates and future covariates be the same series passed to the model.predict function, i.e. the foreground_series should end at the first prediction timestamp t=0?
Question 2:
I tried training the SHAP explainer on different background series (train or test set) but I am getting back the exact same SHAP results. For example
split_date = pd.to_datetime('2023-02-28 23:59:00')
test_set_start_date = pd.to_datetime('2023-03-01 00:00:00')
#shap explainer trained on train set but used to explain test set
shap_explainer = ShapExplainer(lgbm_model)
df1 = shap_explainer.explain(
foreground_series = hf_data_dict['target_hf'][: split_date],
foreground_past_covariates = hf_data_dict['past_cov_hf'][test_set_start_date: ],
foreground_future_covariates = hf_data_dict['future_cov_hf'][test_set_start_date: ],
horizons = [3]
)
#shap explainer trained on test set and used to explain test set
shap_explainer_test = ShapExplainer(
lgbm_model,
background_series = hf_data_dict['target_hf'][: split_date],
background_past_covariates = hf_data_dict['past_cov_hf'][test_set_start_date: ],
background_future_covariates = local_lgbm_hf_output.hf_data_dict['future_cov_hf'][test_set_start_date: ]
)
df2 = shap_explainer_test.explain(
foreground_series = hf_data_dict['target_hf'][:split_date],
foreground_past_covariates = hf_data_dict['past_cov_hf'][test_set_start_date: ],
foreground_future_covariates = hf_data_dict['future_cov_hf'][test_set_start_date: ],
horizons = [3]
)
#shap explainer trained on test set but changing foreground series to same as past and future cov
shap_explainer_test = ShapExplainer(
lgbm_model,
background_series = hf_data_dict['target_hf'][test_set_start_date: ],
background_past_covariates = hf_data_dict['past_cov_hf'][test_set_start_date: ],
background_future_covariates = local_lgbm_hf_output.hf_data_dict['future_cov_hf'][test_set_start_date: ]
)
df3 = shap_explainer_test.explain(
foreground_series = hf_data_dict['target_hf'][:split_date],
foreground_past_covariates = hf_data_dict['past_cov_hf'][test_set_start_date: ],
foreground_future_covariates = hf_data_dict['future_cov_hf'][test_set_start_date: ],
horizons = [3]
)
It seems like all 3 dataframes return exactly the same value which is very weird to me as the ShapExplainer is trained with a different background series. Also when looking at the base_values, they are also exactly the same. Is there a bug somewhere in the implementation or am I using the function wrongly?
The text was updated successfully, but these errors were encountered:
ETTAN93
changed the title
Using SHAPs on unseen data to understand model's predictions
Using SHAP on unseen data to understand model's predictions
Oct 24, 2024
Assuming I have a model that is initialized as such:
Question 1:
I then create the ShapExplainer object by fitting it to the training set.
Assuming now I want to use the ShapExplainer to explain data in the unseen test set, what should be defined as the foreground series?
I tried not providing the foreground series since target lags is 0 but it seems like that is not possible. In that case, should the foreground series, past_covariates and future covariates be the same series passed to the
model.predict
function, i.e. the foreground_series should end at the first prediction timestamp t=0?Or should the foreground_series start from the test_set_start_date?
Question 2:
I tried training the SHAP explainer on different background series (train or test set) but I am getting back the exact same SHAP results. For example
It seems like all 3 dataframes return exactly the same value which is very weird to me as the ShapExplainer is trained with a different background series. Also when looking at the base_values, they are also exactly the same. Is there a bug somewhere in the implementation or am I using the function wrongly?
The text was updated successfully, but these errors were encountered: