-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ELEX-2763-estimandizer #59
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I really like this approach! Two comments/questions:
-
Maybe this goes against what Lenny said but I think it would be nice to be able to pass in any function as an estimand. We should still keep some defaults like what are in the transformation map but also be able to pass a new one in. Does that make sense?
-
Did he say anything about integrating this into the model code? Or does he just want it created so he can use it when he re-writes things?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of small additional comments.
Lenny is going to be out for more of next week than I thought but he said he can do some code reviews so we can have him take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like how this is set up! A few things:
- We'll want to keep this to the outcome variables only (so really only treating dem/gop/turnout/candidates), the features (like race, income etc.) would be treated in the featurizer.
- Also for now, let's keep this to dem/gop/turnout/candidate votes and dem/gop/candidate shares (we can add margin and turnout factor later as part of this ticket.
- Do you mind adding how this is going to get called in
client.py
also?
… in the config file or specified elsewhere to a logging message
Alright, this is ready now (I think) 😅 🙌🏻 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! I'm curious whether this would work from the testbed? If I understand correctly, then the results estimand is only added from the LiveDataHandler, right? We might want to add another function that does the same as add_estimand_baselines but just for the results so that it can be called in isolation?
Also, with a historical election I am getting this error:
KeyError: "['results_party_vote_share_dem'] not in index"
here is an example
elexmodel 2021-11-02_VA_G --estimands party_vote_share_dem --office_id=G --geographic_unit_type=county --percent_reporting 50 --historical
Co-authored-by: Leonard Bronner <[email protected]>
Thanks! Ok, so I just pushed changes that should get the As for your other questions:
|
Thank you for fixing the historical run! It's not working from the testbed for me though. I think the issue is that the new estimand is never added to the current results (outside of the MockLiveDataHandler -- but that handler is only used together with the cli to test). re: unexpected units. Yes we will want to add the new estimand to unexpected units also. We don't make predictions for unexpected units, but we do include their current results in the sums |
Ok, I think I got this working now with the testbed without needing to make any changes to the testbed 🎉 Please try it out and let me know what you think. As a result of the changes I made, I think there are 1-2 additional columns in the |
This works now. Adding columns to the data is totally fine (we really do want to add a new results column), but I don't think we want to add a new baseline column -- since when merging the preprocessed data and the current we now run into a name collision. |
…ta handlers) remove that column so that future merges don't break downstream
Got it, I see, thanks. Ok, I (a) rolled back the changes I had made to those unit tests, and (b) now if a |
Hmm, I am not sure this solved the problem. When I run the testbed with
|
Oh interesting! How did you produce this data frame? (Code, commands, etc.) 🤔 |
I ran the model from the testbed (with |
Ok thanks, I think I got it. Either I misunderstood what Risha was doing, or she was only creating a column |
…imand and/or baseline_estimand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Description
At the moment we are always estimating something that is passed in directly to get_estimates (and something that is directly in the preprocessed_data), this includes dem_votes, gop_votes and total_votes. We want to generalize this process by letting us generate estimands on the fly given the live data or preprocessed data.
Likely, this means creating a class or a function (similar to the Featurizer) that given a data handler (ie. PreprocessedDataHandler, LiveDataHandler or CombinedDataHandler )and a list of estimands, will generate those features in the dataframe.
Since dem_votes, gop_votes and total_votes are already included in the data, this estimand generator should do nothing (or over-write those fields). But it should let us create more complex estimands if we want (ie. margin)
Jira Ticket
https://arcpublishing.atlassian.net/browse/ELEX-2763
Test Steps
elexmodel 2017-11-07_VA_G --estimands=dem --office_id=G --geographic_unit_type=county --percent_reporting 50 --estimand_fns="{'party_vote_share':None}"
elexmodel 2017-11-07_VA_G --estimands=dem --office_id=G --geographic_unit_type=county --percent_reporting 50 --estimand_fns="{None:'Test.hello'}"