Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Early stopping: overfit prevention #4996

Open
mayer79 opened this issue Feb 6, 2022 · 6 comments
Open

Early stopping: overfit prevention #4996

mayer79 opened this issue Feb 6, 2022 · 6 comments

Comments

@mayer79
Copy link
Contributor

mayer79 commented Feb 6, 2022

Currently, "early stopping" monitors validation loss and stops after some unsucessful rounds. This is often used together with gridsearchCV to select a best model. Sometimes, the best performing model shows quite some overfit and one might prefer a model with slightly worse performance but less overfit, depending on the situation.

To actively control for overfit, I would love to see a modification of early stopping. It would stop the booster if after a couple rounds, the validation score is more than "overfit_tolerance" worse than the training score.

It could be used e.g. like this

callbacks=[lgb.early_stopping(20, overfit_tolerance=1.1)]

This would stop the boosting process if after 20 rounds, either the performance stopped improving or the ratio of validation to train performance became >1.1.

@StrikerRUS
Copy link
Collaborator

@mayer79 Thanks a lot for your feature request!

Is this the same as recently implemented for Python-package #4580 ?

@mayer79
Copy link
Contributor Author

mayer79 commented Feb 16, 2022

@StrikerRUS: It is not the same, but is indeed not completely unrelated. Both ideas fight overfit. #4580 seems easier to implement, but trickier to apply in practice (a good value for min_delta heavily depends on the choice of the metric, the learning rate, and other regularization parameters).

The idea in this post uses the ratio of train and valid performance to decide whether overfitting is getting too strong. I tried to draw it in the favourite data science tool "Excel" ;).

image

As a user, I can simply wish: "I don't want to have more than x% overfit on my chosen metric(s)".

We don't need to start with this one, but maybe a logical order could look like this:

  1. Make cb.xyz() functions user visible in the R package and switch to a nice callback interface as in Python.
  2. Add [python-package] early stopping min_delta (fixes #2526) #4580 in R, using the early stopping callback
  3. Add attribute train_score to the R6 Booster object (for lgb.train(), lgb.cv(), and lightgbm())
  4. Implement the overfit prevention idea of this thread in both R and Python

I will of course help with the changes, but I am not sure if bullet point 1 is on the roadmap or not?

@StrikerRUS
Copy link
Collaborator

@mayer79 Ah, I got the difference now, thanks for the detailed explanation with example!

This idea looks good to me. Is there something similar in other Python/R packages we can check as a reference?

but I am not sure if bullet point 1 is on the roadmap or not?

It is: #2479. I guess we can start from this point.

@jameslamb
Copy link
Collaborator

I'd welcome a contribution for #2479 and would be happy to review it, @mayer79 , if you'd like to attempt it.

@mayer79
Copy link
Contributor Author

mayer79 commented Feb 17, 2022

Sounds like a plan!

@jackattackyang
Copy link

I'm working on this for Python but I have a concern for the functionality:

  1. Check overfitting (train valid gap) every boosting iteration, absolute and/or relative
  2. If valid is "better" than train, ignore overfit tolerance check
    • eg. increasing metric, abs tolerance=0.1, train=0.6, valid=0.75 -> ignore
  3. If train valid gap > overfit_tolerance for early_stopping_rounds: trigger early stopping and defer to the current logic for best iteration where the best model within tolerance is retained

A typical train and validation curve might look like this: both train and validation improves with big initial gaps, the gap narrows then widens as validation plateaus while train continues to improve.

My concern is overfit tolerance might be unknowingly triggered for regions where the model is actually underfitting
eg. if a small early_stopping_rounds is used with data below:

callbacks=[lgb.early_stopping(1, overfit_rel_tolerance=1.02)]

Early stopping would be triggered on iteration 1 instead of 24.

Any ideas to mitigate this?

Image

iteration train valid train-valid train/valid
1 0.6424 0.6159 0.0265 1.0430
2 0.6602 0.6392 0.0210 1.0328
3 0.6725 0.6530 0.0195 1.0299
4 0.6766 0.6676 0.0090 1.0135
...
23 0.6884 0.6754 0.0130 1.0192
24 0.6887 0.6751 0.0135 1.0201
25 0.6890 0.6752 0.0138 1.0205
26 0.6894 0.6751 0.0143 1.0211

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants