Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates to Featurizer #69

Merged
merged 11 commits into from
Sep 15, 2023
11 changes: 10 additions & 1 deletion src/elexmodel/handlers/data/Featurizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,15 @@ def generate_holdout_data(self, df: pd.DataFrame) -> pd.DataFrame:

# set the values for active fixed effect in rows that have inactive fixed effect to be 1 / (n + 1)
# rows that have an inactive fixed effect value need to receive the treat of the average fixed effects
# NOTE: aren't we now applying 1 * the dropped fixed effect and 1 / (n + 1) times the other fixed effects?
df.loc[rows_w_inactive_fixed_effects, fe_active_fixed_effects] = 1 / (len(fe_active_fixed_effects) + 1)
# This is correct because even rows with active fixed effects have an interept columns, so the coefficient
# of the fixed effect value column is actually the *difference* between the dropped column (for which the intercept is
# the stand in and the fixed effect column.
# Another way to think about this is that for a fixed effect value that is present the fixed effect estimate is:
# if there are three fixed effects r, u and s where s is dropped.
# beta_0 + beta_r * indic{r}
# beta_0 + beta_u * indic{u}
# and the fixed effect estimate for the dropped value is beta_0, so the average is:
# beta_0 + (beta_r / 3) + (beta_u / 3)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I just saw this comment. And thanks for adding it; it helps me understand the math better 😄 👍🏻

But I have to admit I'm not familiar with this 😞 Do you have another example or some literature you can share?

return self.filter_to_active_features(df)