Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preprocessed data handler superclass #147

Merged
merged 3 commits into from
Dec 19, 2024

Conversation

dmnapolitano
Copy link
Contributor

Description

Hi! The changes in this PR create a BasePreprocessedDataHander superclass that data handlers that work with preprocessed model input data now inherit from. This should help reduce some code redundancy. Thanks! 🎉

This code was previously part of #145 👍🏻

Jira Ticket

Test Steps

tox, elexmodel commands, test bed 🎉

@dmnapolitano dmnapolitano requested a review from a team as a code owner December 13, 2024 16:55
@lennybronner
Copy link
Collaborator

can you please include a few elexmodel commands you tried?

@dmnapolitano
Copy link
Contributor Author

can you please include a few elexmodel commands you tried?

Sure! I just ran this:

elexmodel 2022-11-08_USA_G --estimands=margin --features=baseline_normalized_margin --office_id=S_county --geographic_unit_type=county --pi_method bootstrap --national_summary

I did it on develop and this branch (using pip install -e --force-reinstall to be sure) and the results look identical 🎉 But is there anything else I can check? 🤔

@lennybronner
Copy link
Collaborator

can you please include a few elexmodel commands you tried?

Sure! I just ran this:

elexmodel 2022-11-08_USA_G --estimands=margin --features=baseline_normalized_margin --office_id=S_county --geographic_unit_type=county --pi_method bootstrap --national_summary

I did it on develop and this branch (using pip install -e --force-reinstall to be sure) and the results look identical 🎉 But is there anything else I can check? 🤔

ok, thanks. I'm curious, does the historical flag still work at all?

@dmnapolitano
Copy link
Contributor Author

can you please include a few elexmodel commands you tried?

Sure! I just ran this:

elexmodel 2022-11-08_USA_G --estimands=margin --features=baseline_normalized_margin --office_id=S_county --geographic_unit_type=county --pi_method bootstrap --national_summary

I did it on develop and this branch (using pip install -e --force-reinstall to be sure) and the results look identical 🎉 But is there anything else I can check? 🤔

ok, thanks. I'm curious, does the historical flag still work at all?

Yup, so, our integration tests work, and also this works:

elexmodel 2022-11-08_USA_G --estimands=dem --office_id=S_county --geographic_unit_type=county --pi_method gaussian --historical --aggregates=county_classification

And this works:

elexmodel 2021-11-02_VA_G --estimands=dem --office_id=G --geographic_unit_type=precinct --percent_reporting 60 --historical --aggregates=county_classification

For using --historical with the bootstrap model, I think we need to modify our historical data sets since it's expecting results_normalized_margin, but I'm pretty sure that's outside the scope of this PR 🤔 🎉

@lennybronner
Copy link
Collaborator

can you please include a few elexmodel commands you tried?

Sure! I just ran this:

elexmodel 2022-11-08_USA_G --estimands=margin --features=baseline_normalized_margin --office_id=S_county --geographic_unit_type=county --pi_method bootstrap --national_summary

I did it on develop and this branch (using pip install -e --force-reinstall to be sure) and the results look identical 🎉 But is there anything else I can check? 🤔

ok, thanks. I'm curious, does the historical flag still work at all?

Yup, so, our integration tests work, and also this works:

elexmodel 2022-11-08_USA_G --estimands=dem --office_id=S_county --geographic_unit_type=county --pi_method gaussian --historical --aggregates=county_classification

And this works:

elexmodel 2021-11-02_VA_G --estimands=dem --office_id=G --geographic_unit_type=precinct --percent_reporting 60 --historical --aggregates=county_classification

For using --historical with the bootstrap model, I think we need to modify our historical data sets since it's expecting results_normalized_margin, but I'm pretty sure that's outside the scope of this PR 🤔 🎉

The historical flag works in the example you sent and shows the same output as dev?

@dmnapolitano
Copy link
Contributor Author

can you please include a few elexmodel commands you tried?

Sure! I just ran this:

elexmodel 2022-11-08_USA_G --estimands=margin --features=baseline_normalized_margin --office_id=S_county --geographic_unit_type=county --pi_method bootstrap --national_summary

I did it on develop and this branch (using pip install -e --force-reinstall to be sure) and the results look identical 🎉 But is there anything else I can check? 🤔

ok, thanks. I'm curious, does the historical flag still work at all?

Yup, so, our integration tests work, and also this works:

elexmodel 2022-11-08_USA_G --estimands=dem --office_id=S_county --geographic_unit_type=county --pi_method gaussian --historical --aggregates=county_classification

And this works:

elexmodel 2021-11-02_VA_G --estimands=dem --office_id=G --geographic_unit_type=precinct --percent_reporting 60 --historical --aggregates=county_classification

For using --historical with the bootstrap model, I think we need to modify our historical data sets since it's expecting results_normalized_margin, but I'm pretty sure that's outside the scope of this PR 🤔 🎉

The historical flag works in the example you sent and shows the same output as dev?

Actually this is really interesting. If I run this command multiple times on develop alone:

elexmodel 2021-11-02_VA_G --estimands=dem --office_id=G --geographic_unit_type=precinct --percent_reporting 60 --historical --aggregates=county_classification

I get different results every time. And somewhat dramatically so. For the county_classification aggregates, sometimes I'll see this:

{('VA', 'central'): {'mae_dem': 5382.0, ...

Sometimes it'll be closer to 2k:

{('VA', 'central'): {'mae_dem': 1910.0 ...

Or

{('VA', 'central'): {'mae_dem': 2026.0 ...

Is this to be expected...? 🤔 🤔

@lennybronner
Copy link
Collaborator

can you please include a few elexmodel commands you tried?

Sure! I just ran this:

elexmodel 2022-11-08_USA_G --estimands=margin --features=baseline_normalized_margin --office_id=S_county --geographic_unit_type=county --pi_method bootstrap --national_summary

I did it on develop and this branch (using pip install -e --force-reinstall to be sure) and the results look identical 🎉 But is there anything else I can check? 🤔

ok, thanks. I'm curious, does the historical flag still work at all?

Yup, so, our integration tests work, and also this works:

elexmodel 2022-11-08_USA_G --estimands=dem --office_id=S_county --geographic_unit_type=county --pi_method gaussian --historical --aggregates=county_classification

And this works:

elexmodel 2021-11-02_VA_G --estimands=dem --office_id=G --geographic_unit_type=precinct --percent_reporting 60 --historical --aggregates=county_classification

For using --historical with the bootstrap model, I think we need to modify our historical data sets since it's expecting results_normalized_margin, but I'm pretty sure that's outside the scope of this PR 🤔 🎉

The historical flag works in the example you sent and shows the same output as dev?

Actually this is really interesting. If I run this command multiple times on develop alone:

elexmodel 2021-11-02_VA_G --estimands=dem --office_id=G --geographic_unit_type=precinct --percent_reporting 60 --historical --aggregates=county_classification

I get different results every time. And somewhat dramatically so. For the county_classification aggregates, sometimes I'll see this:

{('VA', 'central'): {'mae_dem': 5382.0, ...

Sometimes it'll be closer to 2k:

{('VA', 'central'): {'mae_dem': 1910.0 ...

Or

{('VA', 'central'): {'mae_dem': 2026.0 ...

Is this to be expected...? 🤔 🤔

yes, the reporting counties are randomly selected every time. you need to set --percent_reporting 100

@dmnapolitano
Copy link
Contributor Author

can you please include a few elexmodel commands you tried?

Sure! I just ran this:

elexmodel 2022-11-08_USA_G --estimands=margin --features=baseline_normalized_margin --office_id=S_county --geographic_unit_type=county --pi_method bootstrap --national_summary

I did it on develop and this branch (using pip install -e --force-reinstall to be sure) and the results look identical 🎉 But is there anything else I can check? 🤔

ok, thanks. I'm curious, does the historical flag still work at all?

Yup, so, our integration tests work, and also this works:

elexmodel 2022-11-08_USA_G --estimands=dem --office_id=S_county --geographic_unit_type=county --pi_method gaussian --historical --aggregates=county_classification

And this works:

elexmodel 2021-11-02_VA_G --estimands=dem --office_id=G --geographic_unit_type=precinct --percent_reporting 60 --historical --aggregates=county_classification

For using --historical with the bootstrap model, I think we need to modify our historical data sets since it's expecting results_normalized_margin, but I'm pretty sure that's outside the scope of this PR 🤔 🎉

The historical flag works in the example you sent and shows the same output as dev?

Actually this is really interesting. If I run this command multiple times on develop alone:

elexmodel 2021-11-02_VA_G --estimands=dem --office_id=G --geographic_unit_type=precinct --percent_reporting 60 --historical --aggregates=county_classification

I get different results every time. And somewhat dramatically so. For the county_classification aggregates, sometimes I'll see this:

{('VA', 'central'): {'mae_dem': 5382.0, ...

Sometimes it'll be closer to 2k:

{('VA', 'central'): {'mae_dem': 1910.0 ...

Or

{('VA', 'central'): {'mae_dem': 2026.0 ...

Is this to be expected...? 🤔 🤔

yes, the reporting counties are randomly selected every time. you need to set --percent_reporting 100

OMG right, I'm so sorry, I totally forgot about that 🙈 Ok, if I change that command to --percent_reporting 100, the output from develop and this branch for that command is identical 🎉

@dmnapolitano dmnapolitano merged commit f36517b into develop Dec 19, 2024
3 checks passed
@dmnapolitano dmnapolitano deleted the preprocessed-data-handler-superclass branch December 19, 2024 18:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants