You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm thoroughly enjoying your book on XGBoost, but I noticed what might be data leakage during hyperparameter tuning. Specifically, on page 46, 47, 48 and 49, it seems both training and test data are used for model fitting.
If this approach is intentional, could you please clarify the rationale? I'd greatly appreciate your insights.
Thank you again for the excellent book.
Example from p. 47 with my inline comments:
# Reflecting on the below approach: Using GridSearchCV with both training and testing data combined # can introduce data leakage. By tuning hyperparameters this way, the model might be indirectly influenced # by the test data, leading to potentially biased selection of "best" hyperparameters. It's best practice # to perform hyperparameter tuning only on the training dataset to ensure that the chosen parameters # generalize well to unseen data.fromsklearn.model_selectionimportGridSearchCVparams= {
'max_depth': [3, 5, 7, 8],
'min_samples_leaf': [1, 3, 4, 5, 6],
'min_samples_split': [2, 3, 4, 5, 6],
}
grid_search=GridSearchCV(estimator=tree.DecisionTreeClassifier(),
param_grid=params, cv=4, n_jobs=-1,
verbose=1, scoring='accuracy')
grid_search.fit(pd.concat([X_train, X_test]),
pd.concat([kag_y_train, kag_y_test]))
The text was updated successfully, but these errors were encountered:
Hi 🐍 Matt Harrison,
I'm thoroughly enjoying your book on XGBoost, but I noticed what might be data leakage during hyperparameter tuning. Specifically, on page 46, 47, 48 and 49, it seems both training and test data are used for model fitting.
If this approach is intentional, could you please clarify the rationale? I'd greatly appreciate your insights.
Thank you again for the excellent book.
Example from p. 47 with my inline comments:
The text was updated successfully, but these errors were encountered: