Updates to Featurizer #69

lennybronner · 2023-08-25T17:10:37Z

Description

In order to get the upcoming bootstrap model working I had to make changes to how the Featurizer works. But these changes need to be compatible with our nonparametric and gaussian models. This PR is only for the necessary changes to the Featurizer and changes in the BaseElectionModel to work with the new Featurizer (and unit test changes).

Beyond centering and scaling the features and adding an intercept, the core problem that the Featurizer needed to deal with was generating fixed effects. Specifically that fixed effects in the fitting data might not appear in the holdout data and fixed effect values in the holdout data might not appear in the fitting data.

In the past we manually added and subtracted the columns. Instead we now generate the fixed effects for all units and instead differentiate between expanded fixed effects (all fixed effects that haven't been dropped to avoid multicolinearity) and active fixed effects (expanded fixed effects that appear in the fitting data).

Jira Ticket

Test Steps

These commands should still work as expected:

elexmodel 2020-11-03_USA_G --office_id=P --estimands=dem --geographic_unit_type=county --pi_method=nonparametric --percent_reporting=20 --aggregates=postal_code --fixed_effects=postal_code
elexmodel 2020-11-03_USA_G --office_id=P --estimands=dem --geographic_unit_type=county --pi_method=nonparametric --percent_reporting=20 --aggregates=postal_code --fixed_effects=postal_code --fixed_effects=county_classification

also tox for unit tests

jchaskell

Mostly looks good. I'm a bit hesitant about the FE approach but don't have any other ideas about how to solve the problem.

src/elexmodel/handlers/data/CombinedData.py

src/elexmodel/handlers/data/Featurizer.py

src/elexmodel/models/BaseElectionModel.py

jchaskell · 2023-08-29T19:47:45Z

tests/handlers/test_featurizer.py

+            "a": [2.0, 2.0, 2.0, 4.0],
+            "b": [1.0, 1.0, 1.0, 3.0],
+            "c": [0.5, 0.5, 0.5, 2.5],
+            "d": [np.inf, np.inf, np.inf, np.inf],


I realize we don't expect non FE features to ever be all the same but do we want it to error out in this case?

I think so? since it will be linearly dependent with the intercept and cause a matrix inversion error either way.

tests/handlers/test_featurizer.py

dmnapolitano

This is my first time doing code reviews in github and I'm not sure if I'm done but I don't want to lose the comments I made so far 😬 If there's a way to "post comments but keep reviewing" I'd be very interested in knowing how (Google is no help here) 😅

src/elexmodel/models/BaseElectionModel.py

src/elexmodel/handlers/data/CombinedData.py

Co-authored-by: Jen Haskell <[email protected]>

lennybronner · 2023-08-30T21:32:43Z

Mostly looks good. I'm a bit hesitant about the FE approach but don't have any other ideas about how to solve the problem.

What part are you hesitant about?

dmnapolitano · 2023-09-01T23:25:17Z

src/elexmodel/handlers/data/Featurizer.py

+            # beta_0 + beta_u * indic{u}
+            # and the fixed effect estimate for the dropped value is beta_0, so the average is:
+            # beta_0 + (beta_r / 3) + (beta_u / 3)
+


Sorry, I just saw this comment. And thanks for adding it; it helps me understand the math better 😄 👍🏻

But I have to admit I'm not familiar with this 😞 Do you have another example or some literature you can share?

jchaskell

lgtm!

dmnapolitano

Same! 🎉

ran linter

39a1fa2

lennybronner requested a review from a team as a code owner August 25, 2023 17:10

small bug fix

740aca2

jchaskell reviewed Aug 29, 2023

View reviewed changes

dmnapolitano reviewed Aug 29, 2023

View reviewed changes

src/elexmodel/models/BaseElectionModel.py Outdated Show resolved Hide resolved

src/elexmodel/models/BaseElectionModel.py Outdated Show resolved Hide resolved

src/elexmodel/handlers/data/CombinedData.py Show resolved Hide resolved

lennybronner and others added 5 commits August 30, 2023 18:41

updated comments

c811e9c

linter

9345e37

Update src/elexmodel/models/BaseElectionModel.py

8387d41

Co-authored-by: Jen Haskell <[email protected]>

Update tests/handlers/test_featurizer.py

c80156d

Co-authored-by: Jen Haskell <[email protected]>

Update tests/handlers/test_featurizer.py

e86c613

Co-authored-by: Jen Haskell <[email protected]>

lennybronner added 2 commits September 1, 2023 20:04

updated comment

8c85b7a

updated linter

20d8ba9

dmnapolitano reviewed Sep 1, 2023

View reviewed changes

lennybronner added 2 commits September 13, 2023 13:53

fixed merge conflict

02c4225

Merge branch 'develop' into updates-to-featurizer

704d8d1

jchaskell approved these changes Sep 14, 2023

View reviewed changes

dmnapolitano approved these changes Sep 14, 2023

View reviewed changes

lennybronner merged commit 8bff4bc into develop Sep 15, 2023
3 checks passed

lennybronner deleted the updates-to-featurizer branch September 15, 2023 13:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updates to Featurizer #69

Updates to Featurizer #69

lennybronner commented Aug 25, 2023

jchaskell left a comment

jchaskell Aug 29, 2023

lennybronner Aug 30, 2023

dmnapolitano left a comment

lennybronner commented Aug 30, 2023

dmnapolitano Sep 1, 2023

jchaskell left a comment

dmnapolitano left a comment

Updates to Featurizer #69

Updates to Featurizer #69

Conversation

lennybronner commented Aug 25, 2023

Description

Jira Ticket

Test Steps

jchaskell left a comment

Choose a reason for hiding this comment

jchaskell Aug 29, 2023

Choose a reason for hiding this comment

lennybronner Aug 30, 2023

Choose a reason for hiding this comment

dmnapolitano left a comment

Choose a reason for hiding this comment

lennybronner commented Aug 30, 2023

dmnapolitano Sep 1, 2023

Choose a reason for hiding this comment

jchaskell left a comment

Choose a reason for hiding this comment

dmnapolitano left a comment

Choose a reason for hiding this comment