Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing duplicate code in Estimandizer class #96

Merged
merged 2 commits into from
Aug 12, 2024

Conversation

dmnapolitano
Copy link
Contributor

@dmnapolitano dmnapolitano commented Aug 6, 2024

Description

Hi! The changes in this PR remove duplicate methods from the Estimandizer class. I genuinely don't know how that happened or what tox warning I was ignoring this entire time 😓 🤔

Also, I forgot to update the Github Actions to use Python 3.11 in PR #95. Hope it's ok to do so here; if not I can easily separate that out 😄

Test Steps

tox

@dmnapolitano dmnapolitano requested a review from a team as a code owner August 6, 2024 15:13
@lennybronner
Copy link
Collaborator

Weird weird. could you do a git blame or something and check how they ended up here?

@dmnapolitano
Copy link
Contributor Author

Weird weird. could you do a git blame or something and check how they ended up here?

Good idea! I've never done that before. Here's what I see:

$ git blame -L 84 src/elexmodel/handlers/data/Estimandizer.py 
cc6a5f0f (lbvienna         2023-09-21 17:29:24 -0400  84)     def add_weights(self, data_df, col_prefix):
cc6a5f0f (lbvienna         2023-09-21 17:29:24 -0400  85)         data_df[f"{col_prefix}weights"] = data_df[f"{col_prefix}turnout"]
9d5c1019 (Diane Napolitano 2023-09-07 13:17:39 -0400  86)         return data_df
795d0cde (lbvienna         2023-09-21 17:38:10 -0400  87) 
44e6b909 (lbvienna         2023-09-21 18:36:20 -0400  88)     def add_turnout_factor(self, data_df):
440a7e06 (lbvienna         2023-09-22 10:49:31 -0400  89)         # posinf and neginf are also set to zero because dividing by zero can lead to nan/posinf/neginf depending
440a7e06 (lbvienna         2023-09-22 10:49:31 -0400  90)         # on the type of the numeric in the numpy array. Assume that if baseline_weights is zero then turnout
440a7e06 (lbvienna         2023-09-22 10:49:31 -0400  91)         # would be incredibly low in this election too (ie. this is effectively an empty precinct) and so setting
440a7e06 (lbvienna         2023-09-22 10:49:31 -0400  92)         # the turnout factor to zero is fine
440a7e06 (lbvienna         2023-09-22 10:49:31 -0400  93)         data_df["turnout_factor"] = np.nan_to_num(
440a7e06 (lbvienna         2023-09-22 10:49:31 -0400  94)             data_df.results_weights / data_df.baseline_weights, nan=0, posinf=0, neginf=0
440a7e06 (lbvienna         2023-09-22 10:49:31 -0400  95)         )
9d5c1019 (Diane Napolitano 2023-09-07 13:17:39 -0400  96)         return data_df
9d5c1019 (Diane Napolitano 2023-09-07 13:17:39 -0400  97) 
422c9974 (lbvienna         2023-09-25 15:40:44 -0400  98)     def add_weights(self, data_df, col_prefix):
422c9974 (lbvienna         2023-09-25 15:40:44 -0400  99)         data_df[f"{col_prefix}weights"] = data_df[f"{col_prefix}turnout"]
422c9974 (lbvienna         2023-09-25 15:40:44 -0400 100)         return data_df
422c9974 (lbvienna         2023-09-25 15:40:44 -0400 101) 
33e04f70 (lbvienna         2023-09-21 12:57:13 -0400 102)     def add_turnout_factor(self, data_df):
275f4cd8 (lbvienna         2023-09-25 16:21:07 -0400 103)         # posinf and neginf are also set to zero because dividing by zero can lead to nan/posinf/neginf depending
275f4cd8 (lbvienna         2023-09-25 16:21:07 -0400 104)         # on the type of the numeric in the numpy array. Assume that if baseline_weights is zero then turnout
275f4cd8 (lbvienna         2023-09-25 16:21:07 -0400 105)         # would be incredibly low in this election too (ie. this is effectively an empty precinct) and so setting
275f4cd8 (lbvienna         2023-09-25 16:21:07 -0400 106)         # the turnout factor to zero is fine
275f4cd8 (lbvienna         2023-09-25 16:21:07 -0400 107)         data_df["turnout_factor"] = np.nan_to_num(
275f4cd8 (lbvienna         2023-09-25 16:21:07 -0400 108)             data_df.results_weights / data_df.baseline_weights, nan=0, posinf=0, neginf=0
275f4cd8 (lbvienna         2023-09-25 16:21:07 -0400 109)         )
33e04f70 (lbvienna         2023-09-21 12:57:13 -0400 110)         return data_df
a9a2354a (Diane Napolitano 2023-09-07 14:24:50 -0400 111) 
795d0cde (lbvienna         2023-09-21 17:38:10 -0400 112) 

I honestly have no idea. My guess is the code was added during multiple PRs after having been removed and we somehow didn't notice 🤔

@lennybronner
Copy link
Collaborator

weird weird

@dmnapolitano dmnapolitano merged commit 4c8aa99 into develop Aug 12, 2024
3 checks passed
@dmnapolitano dmnapolitano deleted the remove-duplicate-code branch August 12, 2024 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants