Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elex 2771 new ols qr #77

Merged
merged 14 commits into from
Sep 25, 2023
Merged

Elex 2771 new ols qr #77

merged 14 commits into from
Sep 25, 2023

Conversation

lennybronner
Copy link
Collaborator

@lennybronner lennybronner commented Sep 22, 2023

Description

This PR moves over parts of the changes we are making for the bootstrap election model PR in order to make reviewing that one easier. It make the changes necessary to the old ConformalElectionModel to work with the updates made to elex-solver in this PR and it makes small tweaks to the estimandizer to prepare it for multiple estimands being generated at once. It also updates unit tests accordingly.

Jira Ticket

https://arcpublishing.atlassian.net/browse/ELEX-2771

Test Steps

tox

also

elexmodel 2020-11-03_USA_G --office_id=P --estimands=dem --geographic_unit_type=county --pi_method=nonparametric --percent_reporting=30 --aggregates=postal_code --historical

@lennybronner lennybronner requested a review from a team as a code owner September 22, 2023 14:11
@dmnapolitano
Copy link
Contributor

@lennybronner thank you so much for all of this! 🎉 🙌🏻

Can you say a bit about what the turnout factor is? 😅

@dmnapolitano
Copy link
Contributor

@lennybronner thank you so much for all of this! 🎉 🙌🏻

Can you say a bit about what the turnout factor is? 😅

Or actually, sorry, I think I see what you're doing. Even though the end result we're sharing with the world is the predicted margin, we still need to predict turnout in order to predict margin. But not all of our data sets include results_turnout, so you're checking to make sure it exists and if not, sum across the (applicable) results_ columns we do have. Do I have that right-ish? 😅

@lennybronner
Copy link
Collaborator Author

@lennybronner thank you so much for all of this! 🎉 🙌🏻
Can you say a bit about what the turnout factor is? 😅

Or actually, sorry, I think I see what you're doing. Even though the end result we're sharing with the world is the predicted margin, we still need to predict turnout in order to predict margin. But not all of our data sets include results_turnout, so you're checking to make sure it exists and if not, sum across the (applicable) results_ columns we do have. Do I have that right-ish? 😅

Yeah, we need to predict turnout in order to get the normalization constant for normalized margin, since we need to go back and forth between unnormalized and normalized margin to move from county predictions to state predictions.

Turnout factor is basically just the ratio of turnout in this election to turnout in last election. In the margin model it's part of what we're estimating. But we also drop units whose turnout factor is greater than or less than some constant. We're basically assuming that if turnout in some county is only 20% of it's last elections turnout (or greater than 200% of last election's turnout) that our results provider either made a mistake (or that we accidentally mismatched precincts), so we drop that county in our model. We can adjust the constants (20/200%) through parameters in the model so in case that there is a super low/high turnout election we don't accidentally drop too many units.

@dmnapolitano
Copy link
Contributor

@lennybronner thank you so much for all of this! 🎉 🙌🏻
Can you say a bit about what the turnout factor is? 😅

Or actually, sorry, I think I see what you're doing. Even though the end result we're sharing with the world is the predicted margin, we still need to predict turnout in order to predict margin. But not all of our data sets include results_turnout, so you're checking to make sure it exists and if not, sum across the (applicable) results_ columns we do have. Do I have that right-ish? 😅

Yeah, we need to predict turnout in order to get the normalization constant for normalized margin, since we need to go back and forth between unnormalized and normalized margin to move from county predictions to state predictions.

Turnout factor is basically just the ratio of turnout in this election to turnout in last election. In the margin model it's part of what we're estimating. But we also drop units whose turnout factor is greater than or less than some constant. We're basically assuming that if turnout in some county is only 20% of it's last elections turnout (or greater than 200% of last election's turnout) that our results provider either made a mistake (or that we accidentally mismatched precincts), so we drop that county in our model. We can adjust the constants (20/200%) through parameters in the model so in case that there is a super low/high turnout election we don't accidentally drop too many units.

Got it!! That's awesome 🎉

What about dropping units whose turnout factors are outliers against the other units? That way, on the off chance the entire state doesn't vote (or does vote), there's no risk of dropping almost every unit in the state. If you've done some evaluation to come up with these constants, that's fine, and I know for now we're primarily interested in big (top-of-the-) ticket races anyway where this is less likely to occur. Just a thought 🤷🏻‍♀️ 😄

Copy link
Contributor

@dmnapolitano dmnapolitano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some questions and questions-that-may-become-suggestions lol, otherwise looks good! 😄 🎉

src/elexmodel/handlers/data/Estimandizer.py Show resolved Hide resolved
src/elexmodel/handlers/data/Estimandizer.py Show resolved Hide resolved
tests/handlers/test_combined_data.py Show resolved Hide resolved
tests/handlers/test_combined_data.py Outdated Show resolved Hide resolved
@lennybronner
Copy link
Collaborator Author

@lennybronner thank you so much for all of this! 🎉 🙌🏻
Can you say a bit about what the turnout factor is? 😅

Or actually, sorry, I think I see what you're doing. Even though the end result we're sharing with the world is the predicted margin, we still need to predict turnout in order to predict margin. But not all of our data sets include results_turnout, so you're checking to make sure it exists and if not, sum across the (applicable) results_ columns we do have. Do I have that right-ish? 😅

Yeah, we need to predict turnout in order to get the normalization constant for normalized margin, since we need to go back and forth between unnormalized and normalized margin to move from county predictions to state predictions.
Turnout factor is basically just the ratio of turnout in this election to turnout in last election. In the margin model it's part of what we're estimating. But we also drop units whose turnout factor is greater than or less than some constant. We're basically assuming that if turnout in some county is only 20% of it's last elections turnout (or greater than 200% of last election's turnout) that our results provider either made a mistake (or that we accidentally mismatched precincts), so we drop that county in our model. We can adjust the constants (20/200%) through parameters in the model so in case that there is a super low/high turnout election we don't accidentally drop too many units.

Got it!! That's awesome 🎉

What about dropping units whose turnout factors are outliers against the other units? That way, on the off chance the entire state doesn't vote (or does vote), there's no risk of dropping almost every unit in the state. If you've done some evaluation to come up with these constants, that's fine, and I know for now we're primarily interested in big (top-of-the-) ticket races anyway where this is less likely to occur. Just a thought 🤷🏻‍♀️ 😄

That's a really good idea! Though I guess would necessitate a bit more computation? Do you mind adding a future ticket to implement?

@dmnapolitano
Copy link
Contributor

@lennybronner thank you so much for all of this! 🎉 🙌🏻
Can you say a bit about what the turnout factor is? 😅

Or actually, sorry, I think I see what you're doing. Even though the end result we're sharing with the world is the predicted margin, we still need to predict turnout in order to predict margin. But not all of our data sets include results_turnout, so you're checking to make sure it exists and if not, sum across the (applicable) results_ columns we do have. Do I have that right-ish? 😅

Yeah, we need to predict turnout in order to get the normalization constant for normalized margin, since we need to go back and forth between unnormalized and normalized margin to move from county predictions to state predictions.
Turnout factor is basically just the ratio of turnout in this election to turnout in last election. In the margin model it's part of what we're estimating. But we also drop units whose turnout factor is greater than or less than some constant. We're basically assuming that if turnout in some county is only 20% of it's last elections turnout (or greater than 200% of last election's turnout) that our results provider either made a mistake (or that we accidentally mismatched precincts), so we drop that county in our model. We can adjust the constants (20/200%) through parameters in the model so in case that there is a super low/high turnout election we don't accidentally drop too many units.

Got it!! That's awesome 🎉
What about dropping units whose turnout factors are outliers against the other units? That way, on the off chance the entire state doesn't vote (or does vote), there's no risk of dropping almost every unit in the state. If you've done some evaluation to come up with these constants, that's fine, and I know for now we're primarily interested in big (top-of-the-) ticket races anyway where this is less likely to occur. Just a thought 🤷🏻‍♀️ 😄

That's a really good idea! Though I guess would necessitate a bit more computation? Do you mind adding a future ticket to implement?

Sure! Thanks! 😄 🎉 The ticket is here: https://arcpublishing.atlassian.net/browse/ELEX-3298

Copy link
Contributor

@dmnapolitano dmnapolitano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good! 🎉 🎉

@lennybronner lennybronner merged commit 2043a30 into develop Sep 25, 2023
@lennybronner lennybronner deleted the elex-2771-new-ols-qr branch September 25, 2023 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants