Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross-validation with ridge regression but not normal linear regression? #44

Open
CooperNederhood opened this issue May 25, 2018 · 3 comments

Comments

@CooperNederhood
Copy link

From my review it seems everyone does cross-validation with ridge regression. But no one ever seems to do cross-validation with a simple linear regression, and I understand ridge regression is just linear regression with an L2 penalty term on the objective function - so what gives? Am I lunatic if I just run ridge regression without cross-validation?

Any thoughts are appreciated!

@bensoltoff
Copy link
Contributor

I would assert most social scientists do not employ cross-validation for most regression models - linear regression, logistic regression, generalized linear models, etc. It doesn't make it right, it's just that most social scientists are not trained in cross-validation methods and don't know how or when to employ them. Is there a reason you do not want to use cross-validation with your ridge regression model?

@CooperNederhood
Copy link
Author

I am really just looking for an R-squared measure for my linear regression, so if I were doing a linear regression I'd just get the normal R-squared from the regression output. But I am doing the ridge regression to avoid overfitting (I have more regressors than observations. Yikes!) so I just want the corresponding R-squared measure for the ridge regression model.

The way I think about it, the cross-validation error is really an estimate of the out of sample fit while the linear regression R-squared is just the in sample fit. So they're really fundamentally different ways of assessing model fitness, right?

For my purposes, given I'm interested in a predictive model and wary of overfitting I think the cross-validation approach is best (and that is how I've implemented it). Thanks very much for the thoughts!

@bensoltoff
Copy link
Contributor

Yes, cross-validation is better. Even for inferential studies, cross-validation or out-of-sample test statistics really are better for model comparison and assessing overall model fit. As we've discussed extensively in the perspectives sequence, relying on training set statistics can lead you to biased models and incorrect inferences/predictions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants