Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ELEX-2763-estimandizer #59

Merged
merged 48 commits into from
Sep 13, 2023
Merged

ELEX-2763-estimandizer #59

merged 48 commits into from
Sep 13, 2023

Conversation

rishasurana
Copy link
Contributor

@rishasurana rishasurana commented Jun 29, 2023

Description

At the moment we are always estimating something that is passed in directly to get_estimates (and something that is directly in the preprocessed_data), this includes dem_votes, gop_votes and total_votes. We want to generalize this process by letting us generate estimands on the fly given the live data or preprocessed data.

Likely, this means creating a class or a function (similar to the Featurizer) that given a data handler (ie. PreprocessedDataHandler, LiveDataHandler or CombinedDataHandler )and a list of estimands, will generate those features in the dataframe.

Since dem_votes, gop_votes and total_votes are already included in the data, this estimand generator should do nothing (or over-write those fields). But it should let us create more complex estimands if we want (ie. margin)

Jira Ticket

https://arcpublishing.atlassian.net/browse/ELEX-2763

Test Steps

  1. test_estimandizer.py
  2. test_client.py
  3. Test to run client directly with a pre-written function: elexmodel 2017-11-07_VA_G --estimands=dem --office_id=G --geographic_unit_type=county --percent_reporting 50 --estimand_fns="{'party_vote_share':None}"
  4. Test to run client directly with a given function: elexmodel 2017-11-07_VA_G --estimands=dem --office_id=G --geographic_unit_type=county --percent_reporting 50 --estimand_fns="{None:'Test.hello'}"

@rishasurana rishasurana self-assigned this Jun 29, 2023
@rishasurana rishasurana changed the title Generate estimands explicitly estimandizer Jun 30, 2023
@rishasurana rishasurana changed the title estimandizer ELEX-2763-estimandizer Jun 30, 2023
@rishasurana rishasurana changed the title ELEX-2763-estimandizer ELEX-2763-estimandizer-old Aug 2, 2023
@rishasurana rishasurana changed the title ELEX-2763-estimandizer-old ELEX-2763-estimandizer Aug 3, 2023
@rishasurana rishasurana marked this pull request as ready for review August 3, 2023 00:27
@rishasurana rishasurana requested a review from a team as a code owner August 3, 2023 00:27
Copy link
Contributor

@jchaskell jchaskell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I really like this approach! Two comments/questions:

  1. Maybe this goes against what Lenny said but I think it would be nice to be able to pass in any function as an estimand. We should still keep some defaults like what are in the transformation map but also be able to pass a new one in. Does that make sense?

  2. Did he say anything about integrating this into the model code? Or does he just want it created so he can use it when he re-writes things?

src/elexmodel/handlers/data/Estimandizer.py Outdated Show resolved Hide resolved
src/elexmodel/handlers/data/Estimandizer.py Outdated Show resolved Hide resolved
src/elexmodel/handlers/data/Estimandizer.py Outdated Show resolved Hide resolved
src/elexmodel/handlers/data/Estimandizer.py Outdated Show resolved Hide resolved
@rishasurana
Copy link
Contributor Author

In general, I really like this approach! Two comments/questions:

  1. Maybe this goes against what Lenny said but I think it would be nice to be able to pass in any function as an estimand. We should still keep some defaults like what are in the transformation map but also be able to pass a new one in. Does that make sense?
  2. Did he say anything about integrating this into the model code? Or does he just want it created so he can use it when he re-writes things?
  1. Added the capability to pass in a dict of {estimand: function}
  2. He didn't mention this and I don't see it on the Jira ticket, but I will ask him when he comes back!

@rishasurana rishasurana requested a review from jchaskell August 4, 2023 15:01
Copy link
Contributor

@jchaskell jchaskell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of small additional comments.

Lenny is going to be out for more of next week than I thought but he said he can do some code reviews so we can have him take a look.

src/elexmodel/handlers/data/Estimandizer.py Outdated Show resolved Hide resolved
src/elexmodel/handlers/data/Estimandizer.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@lennybronner lennybronner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like how this is set up! A few things:

  1. We'll want to keep this to the outcome variables only (so really only treating dem/gop/turnout/candidates), the features (like race, income etc.) would be treated in the featurizer.
  2. Also for now, let's keep this to dem/gop/turnout/candidate votes and dem/gop/candidate shares (we can add margin and turnout factor later as part of this ticket.
  3. Do you mind adding how this is going to get called in client.py also?

… in the config file or specified elsewhere to a logging message
@dmnapolitano
Copy link
Contributor

Hang on, sorry, I missed a spot where estimands are being checked for validity 🤔

Alright, this is ready now (I think) 😅 🙌🏻

@dmnapolitano dmnapolitano marked this pull request as ready for review September 7, 2023 21:15
@dmnapolitano dmnapolitano removed their request for review September 7, 2023 21:16
Copy link
Collaborator

@lennybronner lennybronner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I'm curious whether this would work from the testbed? If I understand correctly, then the results estimand is only added from the LiveDataHandler, right? We might want to add another function that does the same as add_estimand_baselines but just for the results so that it can be called in isolation?

Also, with a historical election I am getting this error:

KeyError: "['results_party_vote_share_dem'] not in index"

here is an example

 elexmodel 2021-11-02_VA_G --estimands party_vote_share_dem --office_id=G --geographic_unit_type=county --percent_reporting 50 --historical

tests/test_client.py Show resolved Hide resolved
tests/test_client.py Outdated Show resolved Hide resolved
src/elexmodel/client.py Show resolved Hide resolved
@dmnapolitano
Copy link
Contributor

This looks great! I'm curious whether this would work from the testbed? If I understand correctly, then the results estimand is only added from the LiveDataHandler, right? We might want to add another function that does the same as add_estimand_baselines but just for the results so that it can be called in isolation?

Also, with a historical election I am getting this error:

KeyError: "['results_party_vote_share_dem'] not in index"

here is an example

 elexmodel 2021-11-02_VA_G --estimands party_vote_share_dem --office_id=G --geographic_unit_type=county --percent_reporting 50 --historical

Thanks! Ok, so I just pushed changes that should get the --historical to work properly. Sorry about that and thanks for providing that example 😅 🎉

As for your other questions:

  1. Yes, this should work from the testbed 🤞🏻 However, I'm trying to determine what we should do with unexpected units and a "new_estimand". How should we populate results_new_estimand for unexpected units? We still want to generate predictions for "new_estimand" for unexpected units, right?
  2. The new estimands' results_ columns should be added from both LiveDataHandler and PreprocessedDataHandler objects (assuming I did it right 😅).
  3. "We might want to add another function that does the same as add_estimand_baselines but just for the results so that it can be called in isolation?" Sorry, I'm not sure I understand this. What's a use-case for this? 🤔

@lennybronner
Copy link
Collaborator

This looks great! I'm curious whether this would work from the testbed? If I understand correctly, then the results estimand is only added from the LiveDataHandler, right? We might want to add another function that does the same as add_estimand_baselines but just for the results so that it can be called in isolation?
Also, with a historical election I am getting this error:

KeyError: "['results_party_vote_share_dem'] not in index"

here is an example

 elexmodel 2021-11-02_VA_G --estimands party_vote_share_dem --office_id=G --geographic_unit_type=county --percent_reporting 50 --historical

Thanks! Ok, so I just pushed changes that should get the --historical to work properly. Sorry about that and thanks for providing that example 😅 🎉

As for your other questions:

  1. Yes, this should work from the testbed 🤞🏻 However, I'm trying to determine what we should do with unexpected units and a "new_estimand". How should we populate results_new_estimand for unexpected units? We still want to generate predictions for "new_estimand" for unexpected units, right?
  2. The new estimands' results_ columns should be added from both LiveDataHandler and PreprocessedDataHandler objects (assuming I did it right 😅).
  3. "We might want to add another function that does the same as add_estimand_baselines but just for the results so that it can be called in isolation?" Sorry, I'm not sure I understand this. What's a use-case for this? 🤔

Thank you for fixing the historical run!

It's not working from the testbed for me though. I think the issue is that the new estimand is never added to the current results (outside of the MockLiveDataHandler -- but that handler is only used together with the cli to test).

re: unexpected units. Yes we will want to add the new estimand to unexpected units also. We don't make predictions for unexpected units, but we do include their current results in the sums

@dmnapolitano
Copy link
Contributor

This looks great! I'm curious whether this would work from the testbed? If I understand correctly, then the results estimand is only added from the LiveDataHandler, right? We might want to add another function that does the same as add_estimand_baselines but just for the results so that it can be called in isolation?
Also, with a historical election I am getting this error:

KeyError: "['results_party_vote_share_dem'] not in index"

here is an example

 elexmodel 2021-11-02_VA_G --estimands party_vote_share_dem --office_id=G --geographic_unit_type=county --percent_reporting 50 --historical

Thanks! Ok, so I just pushed changes that should get the --historical to work properly. Sorry about that and thanks for providing that example 😅 🎉
As for your other questions:

  1. Yes, this should work from the testbed 🤞🏻 However, I'm trying to determine what we should do with unexpected units and a "new_estimand". How should we populate results_new_estimand for unexpected units? We still want to generate predictions for "new_estimand" for unexpected units, right?
  2. The new estimands' results_ columns should be added from both LiveDataHandler and PreprocessedDataHandler objects (assuming I did it right 😅).
  3. "We might want to add another function that does the same as add_estimand_baselines but just for the results so that it can be called in isolation?" Sorry, I'm not sure I understand this. What's a use-case for this? 🤔

Thank you for fixing the historical run!

It's not working from the testbed for me though. I think the issue is that the new estimand is never added to the current results (outside of the MockLiveDataHandler -- but that handler is only used together with the cli to test).

re: unexpected units. Yes we will want to add the new estimand to unexpected units also. We don't make predictions for unexpected units, but we do include their current results in the sums

Ok, I think I got this working now with the testbed without needing to make any changes to the testbed 🎉 Please try it out and let me know what you think. As a result of the changes I made, I think there are 1-2 additional columns in the CombinedDataHandler.data data frames depending on the number of estimands, but nothing else has changed with the unit tests, so I think it's alright 🤔 Open to suggestions, of course.

@lennybronner
Copy link
Collaborator

This looks great! I'm curious whether this would work from the testbed? If I understand correctly, then the results estimand is only added from the LiveDataHandler, right? We might want to add another function that does the same as add_estimand_baselines but just for the results so that it can be called in isolation?
Also, with a historical election I am getting this error:

KeyError: "['results_party_vote_share_dem'] not in index"

here is an example

 elexmodel 2021-11-02_VA_G --estimands party_vote_share_dem --office_id=G --geographic_unit_type=county --percent_reporting 50 --historical

Thanks! Ok, so I just pushed changes that should get the --historical to work properly. Sorry about that and thanks for providing that example 😅 🎉
As for your other questions:

  1. Yes, this should work from the testbed 🤞🏻 However, I'm trying to determine what we should do with unexpected units and a "new_estimand". How should we populate results_new_estimand for unexpected units? We still want to generate predictions for "new_estimand" for unexpected units, right?
  2. The new estimands' results_ columns should be added from both LiveDataHandler and PreprocessedDataHandler objects (assuming I did it right 😅).
  3. "We might want to add another function that does the same as add_estimand_baselines but just for the results so that it can be called in isolation?" Sorry, I'm not sure I understand this. What's a use-case for this? 🤔

Thank you for fixing the historical run!
It's not working from the testbed for me though. I think the issue is that the new estimand is never added to the current results (outside of the MockLiveDataHandler -- but that handler is only used together with the cli to test).
re: unexpected units. Yes we will want to add the new estimand to unexpected units also. We don't make predictions for unexpected units, but we do include their current results in the sums

Ok, I think I got this working now with the testbed without needing to make any changes to the testbed 🎉 Please try it out and let me know what you think. As a result of the changes I made, I think there are 1-2 additional columns in the CombinedDataHandler.data data frames depending on the number of estimands, but nothing else has changed with the unit tests, so I think it's alright 🤔 Open to suggestions, of course.

This works now. Adding columns to the data is totally fine (we really do want to add a new results column), but I don't think we want to add a new baseline column -- since when merging the preprocessed data and the current we now run into a name collision.

@dmnapolitano
Copy link
Contributor

dmnapolitano commented Sep 11, 2023

This looks great! I'm curious whether this would work from the testbed? If I understand correctly, then the results estimand is only added from the LiveDataHandler, right? We might want to add another function that does the same as add_estimand_baselines but just for the results so that it can be called in isolation?
Also, with a historical election I am getting this error:

KeyError: "['results_party_vote_share_dem'] not in index"

here is an example

 elexmodel 2021-11-02_VA_G --estimands party_vote_share_dem --office_id=G --geographic_unit_type=county --percent_reporting 50 --historical

Thanks! Ok, so I just pushed changes that should get the --historical to work properly. Sorry about that and thanks for providing that example 😅 🎉
As for your other questions:

  1. Yes, this should work from the testbed 🤞🏻 However, I'm trying to determine what we should do with unexpected units and a "new_estimand". How should we populate results_new_estimand for unexpected units? We still want to generate predictions for "new_estimand" for unexpected units, right?
  2. The new estimands' results_ columns should be added from both LiveDataHandler and PreprocessedDataHandler objects (assuming I did it right 😅).
  3. "We might want to add another function that does the same as add_estimand_baselines but just for the results so that it can be called in isolation?" Sorry, I'm not sure I understand this. What's a use-case for this? 🤔

Thank you for fixing the historical run!
It's not working from the testbed for me though. I think the issue is that the new estimand is never added to the current results (outside of the MockLiveDataHandler -- but that handler is only used together with the cli to test).
re: unexpected units. Yes we will want to add the new estimand to unexpected units also. We don't make predictions for unexpected units, but we do include their current results in the sums

Ok, I think I got this working now with the testbed without needing to make any changes to the testbed 🎉 Please try it out and let me know what you think. As a result of the changes I made, I think there are 1-2 additional columns in the CombinedDataHandler.data data frames depending on the number of estimands, but nothing else has changed with the unit tests, so I think it's alright 🤔 Open to suggestions, of course.

This works now. Adding columns to the data is totally fine (we really do want to add a new results column), but I don't think we want to add a new baseline column -- since when merging the preprocessed data and the current we now run into a name collision.

Got it, I see, thanks. Ok, I (a) rolled back the changes I had made to those unit tests, and (b) now if a baseline_ column was added during the CombinedDataHandler estimandization process, it's now deleted. Let me know what you think 😄

@lennybronner
Copy link
Collaborator

This looks great! I'm curious whether this would work from the testbed? If I understand correctly, then the results estimand is only added from the LiveDataHandler, right? We might want to add another function that does the same as add_estimand_baselines but just for the results so that it can be called in isolation?
Also, with a historical election I am getting this error:

KeyError: "['results_party_vote_share_dem'] not in index"

here is an example

 elexmodel 2021-11-02_VA_G --estimands party_vote_share_dem --office_id=G --geographic_unit_type=county --percent_reporting 50 --historical

Thanks! Ok, so I just pushed changes that should get the --historical to work properly. Sorry about that and thanks for providing that example 😅 🎉
As for your other questions:

  1. Yes, this should work from the testbed 🤞🏻 However, I'm trying to determine what we should do with unexpected units and a "new_estimand". How should we populate results_new_estimand for unexpected units? We still want to generate predictions for "new_estimand" for unexpected units, right?
  2. The new estimands' results_ columns should be added from both LiveDataHandler and PreprocessedDataHandler objects (assuming I did it right 😅).
  3. "We might want to add another function that does the same as add_estimand_baselines but just for the results so that it can be called in isolation?" Sorry, I'm not sure I understand this. What's a use-case for this? 🤔

Thank you for fixing the historical run!
It's not working from the testbed for me though. I think the issue is that the new estimand is never added to the current results (outside of the MockLiveDataHandler -- but that handler is only used together with the cli to test).
re: unexpected units. Yes we will want to add the new estimand to unexpected units also. We don't make predictions for unexpected units, but we do include their current results in the sums

Ok, I think I got this working now with the testbed without needing to make any changes to the testbed 🎉 Please try it out and let me know what you think. As a result of the changes I made, I think there are 1-2 additional columns in the CombinedDataHandler.data data frames depending on the number of estimands, but nothing else has changed with the unit tests, so I think it's alright 🤔 Open to suggestions, of course.

This works now. Adding columns to the data is totally fine (we really do want to add a new results column), but I don't think we want to add a new baseline column -- since when merging the preprocessed data and the current we now run into a name collision.

Got it, I see, thanks. Ok, I (a) rolled back the changes I had made to those unit tests, and (b) now if a baseline_ column was added during the CombinedDataHandler estimandization process, it's now deleted. Let me know what you think 😄

Hmm, I am not sure this solved the problem. When I run the testbed with party_vote_share_dem as a estimand this is what current results looks like:

     geographic_unit_fips    State  ... baseline_party_vote_share_dem  results_party_vote_share_dem
0                   01001  Alabama  ...                      0.239569                      0.239569
1                   01003  Alabama  ...                      0.194804                      0.194804
2                   01005  Alabama  ...                      0.466603                      0.466603
3                   01007  Alabama  ...                      0.214220                      0.214220
4                   01009  Alabama  ...                      0.084699                      0.084699

@dmnapolitano
Copy link
Contributor

This looks great! I'm curious whether this would work from the testbed? If I understand correctly, then the results estimand is only added from the LiveDataHandler, right? We might want to add another function that does the same as add_estimand_baselines but just for the results so that it can be called in isolation?
Also, with a historical election I am getting this error:

KeyError: "['results_party_vote_share_dem'] not in index"

here is an example

 elexmodel 2021-11-02_VA_G --estimands party_vote_share_dem --office_id=G --geographic_unit_type=county --percent_reporting 50 --historical

Thanks! Ok, so I just pushed changes that should get the --historical to work properly. Sorry about that and thanks for providing that example 😅 🎉
As for your other questions:

  1. Yes, this should work from the testbed 🤞🏻 However, I'm trying to determine what we should do with unexpected units and a "new_estimand". How should we populate results_new_estimand for unexpected units? We still want to generate predictions for "new_estimand" for unexpected units, right?
  2. The new estimands' results_ columns should be added from both LiveDataHandler and PreprocessedDataHandler objects (assuming I did it right 😅).
  3. "We might want to add another function that does the same as add_estimand_baselines but just for the results so that it can be called in isolation?" Sorry, I'm not sure I understand this. What's a use-case for this? 🤔

Thank you for fixing the historical run!
It's not working from the testbed for me though. I think the issue is that the new estimand is never added to the current results (outside of the MockLiveDataHandler -- but that handler is only used together with the cli to test).
re: unexpected units. Yes we will want to add the new estimand to unexpected units also. We don't make predictions for unexpected units, but we do include their current results in the sums

Ok, I think I got this working now with the testbed without needing to make any changes to the testbed 🎉 Please try it out and let me know what you think. As a result of the changes I made, I think there are 1-2 additional columns in the CombinedDataHandler.data data frames depending on the number of estimands, but nothing else has changed with the unit tests, so I think it's alright 🤔 Open to suggestions, of course.

This works now. Adding columns to the data is totally fine (we really do want to add a new results column), but I don't think we want to add a new baseline column -- since when merging the preprocessed data and the current we now run into a name collision.

Got it, I see, thanks. Ok, I (a) rolled back the changes I had made to those unit tests, and (b) now if a baseline_ column was added during the CombinedDataHandler estimandization process, it's now deleted. Let me know what you think 😄

Hmm, I am not sure this solved the problem. When I run the testbed with party_vote_share_dem as a estimand this is what current results looks like:

     geographic_unit_fips    State  ... baseline_party_vote_share_dem  results_party_vote_share_dem
0                   01001  Alabama  ...                      0.239569                      0.239569
1                   01003  Alabama  ...                      0.194804                      0.194804
2                   01005  Alabama  ...                      0.466603                      0.466603
3                   01007  Alabama  ...                      0.214220                      0.214220
4                   01009  Alabama  ...                      0.084699                      0.084699

Oh interesting! How did you produce this data frame? (Code, commands, etc.) 🤔

@lennybronner
Copy link
Collaborator

This looks great! I'm curious whether this would work from the testbed? If I understand correctly, then the results estimand is only added from the LiveDataHandler, right? We might want to add another function that does the same as add_estimand_baselines but just for the results so that it can be called in isolation?
Also, with a historical election I am getting this error:

KeyError: "['results_party_vote_share_dem'] not in index"

here is an example

 elexmodel 2021-11-02_VA_G --estimands party_vote_share_dem --office_id=G --geographic_unit_type=county --percent_reporting 50 --historical

Thanks! Ok, so I just pushed changes that should get the --historical to work properly. Sorry about that and thanks for providing that example 😅 🎉
As for your other questions:

  1. Yes, this should work from the testbed 🤞🏻 However, I'm trying to determine what we should do with unexpected units and a "new_estimand". How should we populate results_new_estimand for unexpected units? We still want to generate predictions for "new_estimand" for unexpected units, right?
  2. The new estimands' results_ columns should be added from both LiveDataHandler and PreprocessedDataHandler objects (assuming I did it right 😅).
  3. "We might want to add another function that does the same as add_estimand_baselines but just for the results so that it can be called in isolation?" Sorry, I'm not sure I understand this. What's a use-case for this? 🤔

Thank you for fixing the historical run!
It's not working from the testbed for me though. I think the issue is that the new estimand is never added to the current results (outside of the MockLiveDataHandler -- but that handler is only used together with the cli to test).
re: unexpected units. Yes we will want to add the new estimand to unexpected units also. We don't make predictions for unexpected units, but we do include their current results in the sums

Ok, I think I got this working now with the testbed without needing to make any changes to the testbed 🎉 Please try it out and let me know what you think. As a result of the changes I made, I think there are 1-2 additional columns in the CombinedDataHandler.data data frames depending on the number of estimands, but nothing else has changed with the unit tests, so I think it's alright 🤔 Open to suggestions, of course.

This works now. Adding columns to the data is totally fine (we really do want to add a new results column), but I don't think we want to add a new baseline column -- since when merging the preprocessed data and the current we now run into a name collision.

Got it, I see, thanks. Ok, I (a) rolled back the changes I had made to those unit tests, and (b) now if a baseline_ column was added during the CombinedDataHandler estimandization process, it's now deleted. Let me know what you think 😄

Hmm, I am not sure this solved the problem. When I run the testbed with party_vote_share_dem as a estimand this is what current results looks like:

     geographic_unit_fips    State  ... baseline_party_vote_share_dem  results_party_vote_share_dem
0                   01001  Alabama  ...                      0.239569                      0.239569
1                   01003  Alabama  ...                      0.194804                      0.194804
2                   01005  Alabama  ...                      0.466603                      0.466603
3                   01007  Alabama  ...                      0.214220                      0.214220
4                   01009  Alabama  ...                      0.084699                      0.084699

Oh interesting! How did you produce this data frame? (Code, commands, etc.) 🤔

I ran the model from the testbed (with estimands = ["dem", "party_vote_share_dem"]) and put a breakpoint in the CombinedDataHandler to inspect current_data

@dmnapolitano
Copy link
Contributor

This looks great! I'm curious whether this would work from the testbed? If I understand correctly, then the results estimand is only added from the LiveDataHandler, right? We might want to add another function that does the same as add_estimand_baselines but just for the results so that it can be called in isolation?
Also, with a historical election I am getting this error:

KeyError: "['results_party_vote_share_dem'] not in index"

here is an example

 elexmodel 2021-11-02_VA_G --estimands party_vote_share_dem --office_id=G --geographic_unit_type=county --percent_reporting 50 --historical

Thanks! Ok, so I just pushed changes that should get the --historical to work properly. Sorry about that and thanks for providing that example 😅 🎉
As for your other questions:

  1. Yes, this should work from the testbed 🤞🏻 However, I'm trying to determine what we should do with unexpected units and a "new_estimand". How should we populate results_new_estimand for unexpected units? We still want to generate predictions for "new_estimand" for unexpected units, right?
  2. The new estimands' results_ columns should be added from both LiveDataHandler and PreprocessedDataHandler objects (assuming I did it right 😅).
  3. "We might want to add another function that does the same as add_estimand_baselines but just for the results so that it can be called in isolation?" Sorry, I'm not sure I understand this. What's a use-case for this? 🤔

Thank you for fixing the historical run!
It's not working from the testbed for me though. I think the issue is that the new estimand is never added to the current results (outside of the MockLiveDataHandler -- but that handler is only used together with the cli to test).
re: unexpected units. Yes we will want to add the new estimand to unexpected units also. We don't make predictions for unexpected units, but we do include their current results in the sums

Ok, I think I got this working now with the testbed without needing to make any changes to the testbed 🎉 Please try it out and let me know what you think. As a result of the changes I made, I think there are 1-2 additional columns in the CombinedDataHandler.data data frames depending on the number of estimands, but nothing else has changed with the unit tests, so I think it's alright 🤔 Open to suggestions, of course.

This works now. Adding columns to the data is totally fine (we really do want to add a new results column), but I don't think we want to add a new baseline column -- since when merging the preprocessed data and the current we now run into a name collision.

Got it, I see, thanks. Ok, I (a) rolled back the changes I had made to those unit tests, and (b) now if a baseline_ column was added during the CombinedDataHandler estimandization process, it's now deleted. Let me know what you think 😄

Hmm, I am not sure this solved the problem. When I run the testbed with party_vote_share_dem as a estimand this is what current results looks like:

     geographic_unit_fips    State  ... baseline_party_vote_share_dem  results_party_vote_share_dem
0                   01001  Alabama  ...                      0.239569                      0.239569
1                   01003  Alabama  ...                      0.194804                      0.194804
2                   01005  Alabama  ...                      0.466603                      0.466603
3                   01007  Alabama  ...                      0.214220                      0.214220
4                   01009  Alabama  ...                      0.084699                      0.084699

Oh interesting! How did you produce this data frame? (Code, commands, etc.) 🤔

I ran the model from the testbed (with estimands = ["dem", "party_vote_share_dem"]) and put a breakpoint in the CombinedDataHandler to inspect current_data

Ok thanks, I think I got it. Either I misunderstood what Risha was doing, or she was only creating a column baseline_foo for new estimand "foo", no matter what. So now I (believe I) adjusted the logic to create results_foo or baseline_foo as necessary. Let me know what you think 😄 🤞🏻

Copy link
Collaborator

@lennybronner lennybronner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@dmnapolitano dmnapolitano merged commit 60f1e1f into develop Sep 13, 2023
@dmnapolitano dmnapolitano deleted the estimandizer branch September 13, 2023 17:47
@jchaskell jchaskell restored the estimandizer branch September 15, 2023 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants