-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cross-validation with ridge regression but not normal linear regression? #44
Comments
I would assert most social scientists do not employ cross-validation for most regression models - linear regression, logistic regression, generalized linear models, etc. It doesn't make it right, it's just that most social scientists are not trained in cross-validation methods and don't know how or when to employ them. Is there a reason you do not want to use cross-validation with your ridge regression model? |
I am really just looking for an R-squared measure for my linear regression, so if I were doing a linear regression I'd just get the normal R-squared from the regression output. But I am doing the ridge regression to avoid overfitting (I have more regressors than observations. Yikes!) so I just want the corresponding R-squared measure for the ridge regression model. The way I think about it, the cross-validation error is really an estimate of the out of sample fit while the linear regression R-squared is just the in sample fit. So they're really fundamentally different ways of assessing model fitness, right? For my purposes, given I'm interested in a predictive model and wary of overfitting I think the cross-validation approach is best (and that is how I've implemented it). Thanks very much for the thoughts! |
Yes, cross-validation is better. Even for inferential studies, cross-validation or out-of-sample test statistics really are better for model comparison and assessing overall model fit. As we've discussed extensively in the perspectives sequence, relying on training set statistics can lead you to biased models and incorrect inferences/predictions. |
From my review it seems everyone does cross-validation with ridge regression. But no one ever seems to do cross-validation with a simple linear regression, and I understand ridge regression is just linear regression with an L2 penalty term on the objective function - so what gives? Am I lunatic if I just run ridge regression without cross-validation?
Any thoughts are appreciated!
The text was updated successfully, but these errors were encountered: