-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updates to Featurizer #69
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good. I'm a bit hesitant about the FE approach but don't have any other ideas about how to solve the problem.
"a": [2.0, 2.0, 2.0, 4.0], | ||
"b": [1.0, 1.0, 1.0, 3.0], | ||
"c": [0.5, 0.5, 0.5, 2.5], | ||
"d": [np.inf, np.inf, np.inf, np.inf], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize we don't expect non FE features to ever be all the same but do we want it to error out in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so? since it will be linearly dependent with the intercept and cause a matrix inversion error either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is my first time doing code reviews in github and I'm not sure if I'm done but I don't want to lose the comments I made so far 😬 If there's a way to "post comments but keep reviewing" I'd be very interested in knowing how (Google is no help here) 😅
Co-authored-by: Jen Haskell <[email protected]>
Co-authored-by: Jen Haskell <[email protected]>
Co-authored-by: Jen Haskell <[email protected]>
What part are you hesitant about? |
# beta_0 + beta_u * indic{u} | ||
# and the fixed effect estimate for the dropped value is beta_0, so the average is: | ||
# beta_0 + (beta_r / 3) + (beta_u / 3) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I just saw this comment. And thanks for adding it; it helps me understand the math better 😄 👍🏻
But I have to admit I'm not familiar with this 😞 Do you have another example or some literature you can share?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same! 🎉
Description
In order to get the upcoming bootstrap model working I had to make changes to how the
Featurizer
works. But these changes need to be compatible with our nonparametric and gaussian models. This PR is only for the necessary changes to the Featurizer and changes in theBaseElectionModel
to work with the new Featurizer (and unit test changes).Beyond centering and scaling the features and adding an intercept, the core problem that the Featurizer needed to deal with was generating fixed effects. Specifically that fixed effects in the fitting data might not appear in the holdout data and fixed effect values in the holdout data might not appear in the fitting data.
In the past we manually added and subtracted the columns. Instead we now generate the fixed effects for all units and instead differentiate between expanded fixed effects (all fixed effects that haven't been dropped to avoid multicolinearity) and active fixed effects (expanded fixed effects that appear in the fitting data).
Jira Ticket
Test Steps
These commands should still work as expected:
also
tox
for unit tests