Removing duplicate code in `Estimandizer` class #96

dmnapolitano · 2024-08-06T15:13:52Z

Description

Hi! The changes in this PR remove duplicate methods from the Estimandizer class. I genuinely don't know how that happened or what tox warning I was ignoring this entire time 😓 🤔

Also, I forgot to update the Github Actions to use Python 3.11 in PR #95. Hope it's ok to do so here; if not I can easily separate that out 😄

Test Steps

tox

lennybronner · 2024-08-06T15:16:12Z

Weird weird. could you do a git blame or something and check how they ended up here?

dmnapolitano · 2024-08-06T15:22:17Z

Weird weird. could you do a git blame or something and check how they ended up here?

Good idea! I've never done that before. Here's what I see:

$ git blame -L 84 src/elexmodel/handlers/data/Estimandizer.py 
cc6a5f0f (lbvienna         2023-09-21 17:29:24 -0400  84)     def add_weights(self, data_df, col_prefix):
cc6a5f0f (lbvienna         2023-09-21 17:29:24 -0400  85)         data_df[f"{col_prefix}weights"] = data_df[f"{col_prefix}turnout"]
9d5c1019 (Diane Napolitano 2023-09-07 13:17:39 -0400  86)         return data_df
795d0cde (lbvienna         2023-09-21 17:38:10 -0400  87) 
44e6b909 (lbvienna         2023-09-21 18:36:20 -0400  88)     def add_turnout_factor(self, data_df):
440a7e06 (lbvienna         2023-09-22 10:49:31 -0400  89)         # posinf and neginf are also set to zero because dividing by zero can lead to nan/posinf/neginf depending
440a7e06 (lbvienna         2023-09-22 10:49:31 -0400  90)         # on the type of the numeric in the numpy array. Assume that if baseline_weights is zero then turnout
440a7e06 (lbvienna         2023-09-22 10:49:31 -0400  91)         # would be incredibly low in this election too (ie. this is effectively an empty precinct) and so setting
440a7e06 (lbvienna         2023-09-22 10:49:31 -0400  92)         # the turnout factor to zero is fine
440a7e06 (lbvienna         2023-09-22 10:49:31 -0400  93)         data_df["turnout_factor"] = np.nan_to_num(
440a7e06 (lbvienna         2023-09-22 10:49:31 -0400  94)             data_df.results_weights / data_df.baseline_weights, nan=0, posinf=0, neginf=0
440a7e06 (lbvienna         2023-09-22 10:49:31 -0400  95)         )
9d5c1019 (Diane Napolitano 2023-09-07 13:17:39 -0400  96)         return data_df
9d5c1019 (Diane Napolitano 2023-09-07 13:17:39 -0400  97) 
422c9974 (lbvienna         2023-09-25 15:40:44 -0400  98)     def add_weights(self, data_df, col_prefix):
422c9974 (lbvienna         2023-09-25 15:40:44 -0400  99)         data_df[f"{col_prefix}weights"] = data_df[f"{col_prefix}turnout"]
422c9974 (lbvienna         2023-09-25 15:40:44 -0400 100)         return data_df
422c9974 (lbvienna         2023-09-25 15:40:44 -0400 101) 
33e04f70 (lbvienna         2023-09-21 12:57:13 -0400 102)     def add_turnout_factor(self, data_df):
275f4cd8 (lbvienna         2023-09-25 16:21:07 -0400 103)         # posinf and neginf are also set to zero because dividing by zero can lead to nan/posinf/neginf depending
275f4cd8 (lbvienna         2023-09-25 16:21:07 -0400 104)         # on the type of the numeric in the numpy array. Assume that if baseline_weights is zero then turnout
275f4cd8 (lbvienna         2023-09-25 16:21:07 -0400 105)         # would be incredibly low in this election too (ie. this is effectively an empty precinct) and so setting
275f4cd8 (lbvienna         2023-09-25 16:21:07 -0400 106)         # the turnout factor to zero is fine
275f4cd8 (lbvienna         2023-09-25 16:21:07 -0400 107)         data_df["turnout_factor"] = np.nan_to_num(
275f4cd8 (lbvienna         2023-09-25 16:21:07 -0400 108)             data_df.results_weights / data_df.baseline_weights, nan=0, posinf=0, neginf=0
275f4cd8 (lbvienna         2023-09-25 16:21:07 -0400 109)         )
33e04f70 (lbvienna         2023-09-21 12:57:13 -0400 110)         return data_df
a9a2354a (Diane Napolitano 2023-09-07 14:24:50 -0400 111) 
795d0cde (lbvienna         2023-09-21 17:38:10 -0400 112)

I honestly have no idea. My guess is the code was added during multiple PRs after having been removed and we somehow didn't notice 🤔

…s PR

lennybronner · 2024-08-12T14:53:09Z

weird weird

Removing duplicate code in Estimandizer class

94865ff

dmnapolitano requested a review from a team as a code owner August 6, 2024 15:13

Forgot to update the Github Actions to use Python 3.11 in the previou…

f867c43

…s PR

lennybronner approved these changes Aug 12, 2024

View reviewed changes

dmnapolitano merged commit 4c8aa99 into develop Aug 12, 2024
3 checks passed

dmnapolitano deleted the remove-duplicate-code branch August 12, 2024 15:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Removing duplicate code in `Estimandizer` class #96

Removing duplicate code in `Estimandizer` class #96

dmnapolitano commented Aug 6, 2024 •

edited

Loading

lennybronner commented Aug 6, 2024

dmnapolitano commented Aug 6, 2024

lennybronner commented Aug 12, 2024

Removing duplicate code in Estimandizer class #96

Removing duplicate code in Estimandizer class #96

Conversation

dmnapolitano commented Aug 6, 2024 • edited Loading

Description

Test Steps

lennybronner commented Aug 6, 2024

dmnapolitano commented Aug 6, 2024

lennybronner commented Aug 12, 2024

Removing duplicate code in `Estimandizer` class #96

Removing duplicate code in `Estimandizer` class #96

dmnapolitano commented Aug 6, 2024 •

edited

Loading