ELEX-2763-estimandizer #59

rishasurana · 2023-06-29T18:36:56Z

Description

At the moment we are always estimating something that is passed in directly to get_estimates (and something that is directly in the preprocessed_data), this includes dem_votes, gop_votes and total_votes. We want to generalize this process by letting us generate estimands on the fly given the live data or preprocessed data.

Likely, this means creating a class or a function (similar to the Featurizer) that given a data handler (ie. PreprocessedDataHandler, LiveDataHandler or CombinedDataHandler )and a list of estimands, will generate those features in the dataframe.

Since dem_votes, gop_votes and total_votes are already included in the data, this estimand generator should do nothing (or over-write those fields). But it should let us create more complex estimands if we want (ie. margin)

Jira Ticket

https://arcpublishing.atlassian.net/browse/ELEX-2763

Test Steps

test_estimandizer.py
test_client.py
Test to run client directly with a pre-written function: elexmodel 2017-11-07_VA_G --estimands=dem --office_id=G --geographic_unit_type=county --percent_reporting 50 --estimand_fns="{'party_vote_share':None}"
Test to run client directly with a given function: elexmodel 2017-11-07_VA_G --estimands=dem --office_id=G --geographic_unit_type=county --percent_reporting 50 --estimand_fns="{None:'Test.hello'}"

…-live-model into estimandizer

jchaskell

In general, I really like this approach! Two comments/questions:

Maybe this goes against what Lenny said but I think it would be nice to be able to pass in any function as an estimand. We should still keep some defaults like what are in the transformation map but also be able to pass a new one in. Does that make sense?
Did he say anything about integrating this into the model code? Or does he just want it created so he can use it when he re-writes things?

src/elexmodel/handlers/data/Estimandizer.py

rishasurana · 2023-08-04T14:49:18Z

Added the capability to pass in a dict of {estimand: function}
He didn't mention this and I don't see it on the Jira ticket, but I will ask him when he comes back!

jchaskell

Couple of small additional comments.

Lenny is going to be out for more of next week than I thought but he said he can do some code reviews so we can have him take a look.

src/elexmodel/handlers/data/Estimandizer.py

lennybronner

I really like how this is set up! A few things:

We'll want to keep this to the outcome variables only (so really only treating dem/gop/turnout/candidates), the features (like race, income etc.) would be treated in the featurizer.
Also for now, let's keep this to dem/gop/turnout/candidate votes and dem/gop/candidate shares (we can add margin and turnout factor later as part of this ticket.
Do you mind adding how this is going to get called in client.py also?

… in the config file or specified elsewhere to a logging message

dmnapolitano · 2023-09-07T21:15:46Z

Alright, this is ready now (I think) 😅 🙌🏻

lennybronner

This looks great! I'm curious whether this would work from the testbed? If I understand correctly, then the results estimand is only added from the LiveDataHandler, right? We might want to add another function that does the same as add_estimand_baselines but just for the results so that it can be called in isolation?

Also, with a historical election I am getting this error:

KeyError: "['results_party_vote_share_dem'] not in index"

here is an example

 elexmodel 2021-11-02_VA_G --estimands party_vote_share_dem --office_id=G --geographic_unit_type=county --percent_reporting 50 --historical

tests/test_client.py

src/elexmodel/client.py

Co-authored-by: Leonard Bronner <[email protected]>

…(sorry)

dmnapolitano · 2023-09-08T15:19:27Z

Thanks! Ok, so I just pushed changes that should get the --historical to work properly. Sorry about that and thanks for providing that example 😅 🎉

As for your other questions:

Yes, this should work from the testbed 🤞🏻 However, I'm trying to determine what we should do with unexpected units and a "new_estimand". How should we populate results_new_estimand for unexpected units? We still want to generate predictions for "new_estimand" for unexpected units, right?
The new estimands' results_ columns should be added from both LiveDataHandler and PreprocessedDataHandler objects (assuming I did it right 😅).
"We might want to add another function that does the same as add_estimand_baselines but just for the results so that it can be called in isolation?" Sorry, I'm not sure I understand this. What's a use-case for this? 🤔

lennybronner · 2023-09-08T16:50:07Z

Thank you for fixing the historical run!

It's not working from the testbed for me though. I think the issue is that the new estimand is never added to the current results (outside of the MockLiveDataHandler -- but that handler is only used together with the cli to test).

re: unexpected units. Yes we will want to add the new estimand to unexpected units also. We don't make predictions for unexpected units, but we do include their current results in the sums

dmnapolitano · 2023-09-08T19:48:44Z

Ok, I think I got this working now with the testbed without needing to make any changes to the testbed 🎉 Please try it out and let me know what you think. As a result of the changes I made, I think there are 1-2 additional columns in the CombinedDataHandler.data data frames depending on the number of estimands, but nothing else has changed with the unit tests, so I think it's alright 🤔 Open to suggestions, of course.

lennybronner · 2023-09-08T20:47:46Z

This works now. Adding columns to the data is totally fine (we really do want to add a new results column), but I don't think we want to add a new baseline column -- since when merging the preprocessed data and the current we now run into a name collision.

…n commit 1772f08

…ta handlers) remove that column so that future merges don't break downstream

dmnapolitano · 2023-09-11T17:18:20Z

Got it, I see, thanks. Ok, I (a) rolled back the changes I had made to those unit tests, and (b) now if a baseline_ column was added during the CombinedDataHandler estimandization process, it's now deleted. Let me know what you think 😄

lennybronner · 2023-09-11T20:59:23Z

Hmm, I am not sure this solved the problem. When I run the testbed with party_vote_share_dem as a estimand this is what current results looks like:

     geographic_unit_fips    State  ... baseline_party_vote_share_dem  results_party_vote_share_dem
0                   01001  Alabama  ...                      0.239569                      0.239569
1                   01003  Alabama  ...                      0.194804                      0.194804
2                   01005  Alabama  ...                      0.466603                      0.466603
3                   01007  Alabama  ...                      0.214220                      0.214220
4                   01009  Alabama  ...                      0.084699                      0.084699

dmnapolitano · 2023-09-12T13:38:44Z

Oh interesting! How did you produce this data frame? (Code, commands, etc.) 🤔

lennybronner · 2023-09-12T14:14:59Z

I ran the model from the testbed (with estimands = ["dem", "party_vote_share_dem"]) and put a breakpoint in the CombinedDataHandler to inspect current_data

dmnapolitano · 2023-09-12T19:07:17Z

Ok thanks, I think I got it. Either I misunderstood what Risha was doing, or she was only creating a column baseline_foo for new estimand "foo", no matter what. So now I (believe I) adjusted the logic to create results_foo or baseline_foo as necessary. Let me know what you think 😄 🤞🏻

…imand and/or baseline_estimand

lennybronner

LGTM!

set up class

0d523c3

rishasurana self-assigned this Jun 29, 2023

rishasurana added 2 commits June 29, 2023 14:39

set-up for linter

68719ef

update cli

25b2311

rishasurana changed the title ~~Generate estimands explicitly~~ estimandizer Jun 30, 2023

rishasurana changed the title ~~estimandizer~~ ELEX-2763-estimandizer Jun 30, 2023

rishasurana added 4 commits July 5, 2023 14:51

base concept

bc5aea6

test updates

144d403

Merge branch 'develop' into estimandizer

4ed0b0f

update pre-commit

83f9e66

rishasurana changed the title ~~ELEX-2763-estimandizer~~ ELEX-2763-estimandizer-old Aug 2, 2023

rishasurana changed the title ~~ELEX-2763-estimandizer-old~~ ELEX-2763-estimandizer Aug 3, 2023

rishasurana added 3 commits August 2, 2023 20:06

Update cli.py

dd2688b

Merge branch 'estimandizer' of https://github.com/washingtonpost/elex…

fe66b0b

…-live-model into estimandizer

test update

3760bf4

rishasurana marked this pull request as ready for review August 3, 2023 00:27

rishasurana requested a review from a team as a code owner August 3, 2023 00:27

rishasurana added the 🔔 ready for review label Aug 3, 2023

jchaskell requested changes Aug 3, 2023

View reviewed changes

updates

b4b1c21

rishasurana requested a review from jchaskell August 4, 2023 15:01

jchaskell previously requested changes Aug 4, 2023

View reviewed changes

src/elexmodel/handlers/data/Estimandizer.py Outdated Show resolved Hide resolved

src/elexmodel/handlers/data/Estimandizer.py Outdated Show resolved Hide resolved

rishasurana added 3 commits August 7, 2023 11:39

column updates

bf93a44

spacing

d32648e

pre-commit

24ff8f2

rishasurana requested a review from lennybronner August 7, 2023 15:43

lennybronner reviewed Aug 7, 2023

View reviewed changes

rishasurana added 2 commits August 9, 2023 18:56

adding to client

85e6fcd

int tests

e3d56ad

Reducing the ValueError for when unknown estimands appear that aren't…

51bd46f

… in the config file or specified elsewhere to a logging message

dmnapolitano marked this pull request as ready for review September 7, 2023 21:15

dmnapolitano added the 🔔 ready for review label Sep 7, 2023

dmnapolitano removed their request for review September 7, 2023 21:16

lennybronner reviewed Sep 7, 2023

View reviewed changes

tests/test_client.py Show resolved Hide resolved

tests/test_client.py Outdated Show resolved Hide resolved

src/elexmodel/client.py Show resolved Hide resolved

dmnapolitano and others added 2 commits September 8, 2023 09:23

Correct leftover spelling mistake in tests/test_client.py

39ed3bc

Co-authored-by: Leonard Bronner <[email protected]>

Getting the new Estimandizer stuff to work with historical elections …

90e959d

…(sorry)

Fixed runs with CombinedDataHandler

1772f08

dmnapolitano added 2 commits September 11, 2023 13:11

Rolling back changes to test_combined_data and test_featurizer made i…

9b2deeb

…n commit 1772f08

If we had to add the baseline column (for example when merging two da…

b512324

…ta handlers) remove that column so that future merges don't break downstream

Fixing logic in the columns being created during estimandization

99b0a52

Handle case of the estimand existing but not the required results_est…

aacf4cc

…imand and/or baseline_estimand

lennybronner approved these changes Sep 13, 2023

View reviewed changes

Correcting some estimandizer instructions in the README

77a14d8

dmnapolitano merged commit 60f1e1f into develop Sep 13, 2023

dmnapolitano deleted the estimandizer branch September 13, 2023 17:47

dmnapolitano mentioned this pull request Sep 15, 2023

ELEX-2763: Estimandizer update #73

Closed

jchaskell mentioned this pull request Sep 15, 2023

Revert "ELEX-2763-estimandizer" #74

Merged

jchaskell restored the estimandizer branch September 15, 2023 20:58

dmnapolitano mentioned this pull request Sep 15, 2023

ELEX-2763: Estimandizer Returns #75

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ELEX-2763-estimandizer #59

ELEX-2763-estimandizer #59

rishasurana commented Jun 29, 2023 •

edited

Loading

jchaskell left a comment

rishasurana commented Aug 4, 2023

jchaskell left a comment

lennybronner left a comment

dmnapolitano commented Sep 7, 2023

lennybronner left a comment •

edited

Loading

dmnapolitano commented Sep 8, 2023

lennybronner commented Sep 8, 2023

dmnapolitano commented Sep 8, 2023

lennybronner commented Sep 8, 2023

dmnapolitano commented Sep 11, 2023 •

edited

Loading

lennybronner commented Sep 11, 2023

dmnapolitano commented Sep 12, 2023

lennybronner commented Sep 12, 2023

dmnapolitano commented Sep 12, 2023

lennybronner left a comment

ELEX-2763-estimandizer #59

ELEX-2763-estimandizer #59

Conversation

rishasurana commented Jun 29, 2023 • edited Loading

Description

Jira Ticket

Test Steps

jchaskell left a comment

Choose a reason for hiding this comment

rishasurana commented Aug 4, 2023

jchaskell left a comment

Choose a reason for hiding this comment

lennybronner left a comment

Choose a reason for hiding this comment

dmnapolitano commented Sep 7, 2023

lennybronner left a comment • edited Loading

Choose a reason for hiding this comment

dmnapolitano commented Sep 8, 2023

lennybronner commented Sep 8, 2023

dmnapolitano commented Sep 8, 2023

lennybronner commented Sep 8, 2023

dmnapolitano commented Sep 11, 2023 • edited Loading

lennybronner commented Sep 11, 2023

dmnapolitano commented Sep 12, 2023

lennybronner commented Sep 12, 2023

dmnapolitano commented Sep 12, 2023

lennybronner left a comment

Choose a reason for hiding this comment

rishasurana commented Jun 29, 2023 •

edited

Loading

lennybronner left a comment •

edited

Loading

dmnapolitano commented Sep 11, 2023 •

edited

Loading