Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conformal bootstrap model #76

Merged
merged 110 commits into from
Oct 12, 2023
Merged
Show file tree
Hide file tree
Changes from 108 commits
Commits
Show all changes
110 commits
Select commit Hold shift + click to select a range
d165986
started working on bootstrap model
lennybronner Jun 16, 2023
d191ad3
works again
lennybronner Jun 21, 2023
ab3a692
back to bivariate model, but univariate model works now and in comments
lennybronner Jun 21, 2023
6b01ef2
Merge branch 'develop' into conformal-bootstrap-model
lennybronner Jun 22, 2023
feb2a13
removed model settings from estimand
lennybronner Jun 22, 2023
ee6b8c3
now doing percentage margin
lennybronner Jun 22, 2023
b6cbef6
added some clipping
lennybronner Jun 22, 2023
47acd2b
added comments for todo
lennybronner Jun 27, 2023
8e41b7d
implemented random effects. doesn't work quite well enough yet
lennybronner Jun 27, 2023
b1ec7fa
Merge branch 'develop' into conformal-bootstrap-model
lennybronner Jul 2, 2023
c503292
epsilon sampling now done better. intervals look good. TODO: robust m…
lennybronner Jul 2, 2023
8731d74
removed units that are clear outliers
lennybronner Jul 6, 2023
05274b5
made progress on narrowing intervals
lennybronner Jul 6, 2023
9700055
still at the same stage. tried parametric bootstrap
lennybronner Jul 8, 2023
a6c4e54
started implementing regularization with OLS
lennybronner Jul 8, 2023
eb03e86
weights issue
lennybronner Jul 10, 2023
f595db1
fixed small bug
lennybronner Jul 11, 2023
af9df02
implemented aggregate model
lennybronner Jul 11, 2023
2fe896c
decomposed code
lennybronner Jul 11, 2023
823ba5f
fixed issues with strata
lennybronner Jul 13, 2023
8740be5
made progress on using reporting percentage
lennybronner Jul 14, 2023
5afb5bb
updated comments
lennybronner Jul 14, 2023
962f6d9
partial reporting integrated
lennybronner Jul 18, 2023
3fa88c0
changed some of the comments/todos
lennybronner Jul 18, 2023
f3a4542
first attempt at including partial reporting for prediction
lennybronner Jul 20, 2023
f4abf5e
forgot weights_test
lennybronner Jul 20, 2023
77d76bb
small change to preprocess data
lennybronner Jul 21, 2023
748dd2b
filter out bad partial reporting bounds
jjcherian Jul 22, 2023
0376aac
first attempt at improving random effect estimate stability
jjcherian Jul 25, 2023
aa23f05
run with cv
lennybronner Aug 14, 2023
23e67f6
small changes on generating estimands
lennybronner Aug 23, 2023
9d0d112
first pass at allowing arbitrary aggregation
lennybronner Aug 23, 2023
08950e7
fixed arbitrary aggregates
lennybronner Aug 24, 2023
738e1b1
finished up national summary statistics: called races etc.
lennybronner Aug 24, 2023
f7deedb
made some progress to run old models
lennybronner Aug 25, 2023
fab0a4a
normal model works again
lennybronner Aug 25, 2023
4ccf60f
old models now run
lennybronner Aug 25, 2023
abffed5
Merge branch 'develop' into conformal-bootstrap-model
lennybronner Aug 25, 2023
300bc2b
working through unit tests
lennybronner Aug 25, 2023
12068a4
update tests
lennybronner Aug 25, 2023
e8b3854
dropping units with zero reporting and also added alpha to national a…
lennybronner Aug 28, 2023
605c5f8
added throwing error if baseline_margin not included as feature
lennybronner Aug 29, 2023
6c2fa9b
changed baseline_margin to baseline_normalized_margin
lennybronner Aug 29, 2023
061038e
drop now works for handle unreporting
lennybronner Aug 30, 2023
d7daf30
dealing with uncontested contests
lennybronner Aug 30, 2023
8f48695
fixed small uncertainty bug in district case
lennybronner Aug 31, 2023
303ca5a
fixed merge conflict
lennybronner Aug 31, 2023
180fb6e
can now run with arbitrary parameters
lennybronner Aug 31, 2023
458d932
started commenting
lennybronner Sep 4, 2023
7c06b1e
added more comments
lennybronner Sep 4, 2023
07ce5d0
fixed merge conflicts
lennybronner Sep 5, 2023
e2cdac1
dealing with last parameters to pass into bootstrap model
lennybronner Sep 5, 2023
322311f
updated reaadme for variables
lennybronner Sep 5, 2023
43a36ff
small fixed
lennybronner Sep 5, 2023
9c85aba
added some comments
lennybronner Sep 5, 2023
64a4b78
added more comments
lennybronner Sep 7, 2023
6277a4c
added comments
lennybronner Sep 7, 2023
f12e3c6
added more comments
lennybronner Sep 7, 2023
2d7e23a
added more comments
lennybronner Sep 7, 2023
0b6f6c2
ran linter
lennybronner Sep 8, 2023
7b02970
ran some linting
lennybronner Sep 8, 2023
21540a7
finalized comments
lennybronner Sep 8, 2023
25f7eda
fixed merge conflicts
lennybronner Sep 13, 2023
5621cea
fixed merge conflicts
lennybronner Sep 15, 2023
26e77df
started working on unit tests
lennybronner Sep 16, 2023
85c1f9a
added comment
lennybronner Sep 18, 2023
ed73a71
updated bootstrapmodel with new elex-solver
lennybronner Sep 19, 2023
615a09d
updated conformal models to use new elex-solver
lennybronner Sep 19, 2023
184f7fe
ran pre-commits, also removed parametric bootstrap model
lennybronner Sep 19, 2023
19b9297
fixed merge conflicts
lennybronner Sep 21, 2023
33e04f7
made progress on getting estimandizer to work with margin model
lennybronner Sep 21, 2023
364c945
looks like estimandizer now works with bootstrap model. still need to…
lennybronner Sep 21, 2023
cc6a5f0
works for old models now
lennybronner Sep 21, 2023
57bf7ad
removed unncessary numpy
lennybronner Sep 21, 2023
4d4c71d
removed some spacing
lennybronner Sep 21, 2023
795d0cd
ran pre-commit
lennybronner Sep 21, 2023
b014498
small change
lennybronner Sep 21, 2023
3274948
fixed featurizer unit tests
lennybronner Sep 21, 2023
c4c8c78
added linter
lennybronner Sep 21, 2023
9b1220a
removed duplicate code
lennybronner Sep 21, 2023
2f64aa5
moved where weights are added
lennybronner Sep 24, 2023
33e8bf6
added more tests, fixed small bug
lennybronner Sep 25, 2023
7260d0f
started working on unit tests
lennybronner Sep 25, 2023
422c997
fixed merge conflict
lennybronner Sep 25, 2023
275f4cd
unit tests pass again
lennybronner Sep 25, 2023
0d52fce
linter
lennybronner Sep 25, 2023
862c3b3
removed breakpoint
lennybronner Sep 25, 2023
4ccbef3
added some more tests
lennybronner Sep 25, 2023
e94bf6a
temporary commit for treating uncontested races 2019 VA
lennybronner Sep 26, 2023
e62e508
removed uncontested. also added dropping wehre baseline turnout is zero
lennybronner Sep 26, 2023
8565c4d
finalized model
lennybronner Sep 26, 2023
fd34b26
updated unit tests
lennybronner Sep 27, 2023
c835dd5
fixed integration tests and added one more
lennybronner Sep 27, 2023
31ec12e
updated estimandizser unit test
lennybronner Sep 27, 2023
d913bbc
linger
lennybronner Sep 27, 2023
5b745b1
added another unit test
lennybronner Sep 28, 2023
d30a701
added another test
lennybronner Sep 28, 2023
69ec8c8
removed breakpoint
lennybronner Sep 29, 2023
ec12384
added more unit tests
lennybronner Sep 30, 2023
7739b7c
added more unit tests
lennybronner Oct 1, 2023
749d0f9
fixed small bug, working on new unit test
lennybronner Oct 2, 2023
024f890
Correcting some spelling mistakes in comments in BootstrapElectionMod…
dmnapolitano Oct 2, 2023
109f3d4
finished first pass at tests
lennybronner Oct 2, 2023
169b192
Merge branch 'conformal-bootstrap-model' of https://github.com/washin…
lennybronner Oct 3, 2023
39e62b8
removed pdb
lennybronner Oct 3, 2023
fb3d220
removed unncessary line in test
lennybronner Oct 4, 2023
8a5b000
added print statement
lennybronner Oct 6, 2023
7bc3e01
unit tests pass again
lennybronner Oct 9, 2023
9d5041f
test epsilons now use sample variance. turnout factor cutoffs increase
lennybronner Oct 12, 2023
e273c40
fixed sitatuon where there is only one contest, including unit tests
lennybronner Oct 12, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 21 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,12 +100,27 @@ elexmodel 2017-11-07_VA_G --estimands=turnout --estimands my_estimand --estimand

Some model types have specific model parameters that can be included.

| Name | Type | Acceptable values | model |
|-----------|---------|-----------------------------|-----------------|
| lambda | numeric | 0-inf | all |
| robust | boolean | larger prediction intervals | `nonparametric` |
| beta | numeric | variance inflation | `gaussian` |
| winsorize | boolean | winsorize std estimate | `gaussian` |
| Name | Type | Acceptable values | model |
|-----------------------------------|---------|----------------------------------|-----------------|
| lambda | numeric | regularization constant | all |
| turnout_factor_lower | numeric | drop units with < turnout factor | all |
| turnout_factor_upper | numeric | drop units with < turnout factor | all |
| robust | boolean | larger prediction intervals | `nonparametric` |
| beta | numeric | variance inflation | `gaussian` |
| winsorize | boolean | winsorize std estimate | `gaussian` |
| B | numeric | bootstrap samples | `bootstrap` |
| T | numeric | temperature for aggregate | `bootstrap` |
| strata | list | groups to stratify bootstrap by | `bootstrap` |
| agg_model_hard_threshold | bool | hard threshold aggregate model | `bootstrap` |
| y_LB | numeric | lower bound norm. margin dist | `bootstrap` |
| y_UB | numeric | upper bound norm. margin dist | `bootstrap` |
| z_LB | numeric | lower bound turnout fact dist | `bootstrap` |
| z_UB | numeric | lower bound turnout fact dist | `bootstrap` |
| y_unobserved_lower_bound | numeric | lower bound for norm. margin | `bootstrap` |
| y_unobserved_upper_bound | numeric | upper bound for norm. margin | `bootstrap` |
| percent_expected_vote_error_bound | numeric | error tolerance on expected vote | `bootstrap` |
| z_unobserved_lower_bound | numeric | lower bound for turnout factor | `bootstrap` |
| z_unobserved_upper_bound | numeric | upper bound for turnout factor | `bootstrap` |

This is the class and function that invokes the general function to generate estimates. You can install `elex-model` as a Python package and use this code snippet in other projects.

Expand Down
7 changes: 1 addition & 6 deletions src/elexmodel/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,12 +35,7 @@ def type_cast_value(self, ctx, value):
"--pi_method",
"pi_method",
default="nonparametric",
type=click.Choice(
[
"gaussian",
"nonparametric",
]
),
type=click.Choice(["gaussian", "nonparametric", "bootstrap"]),
)
@click.option("--prediction_intervals", "prediction_intervals", default=[0.7, 0.9], multiple=True)
@click.option("--percent_reporting_threshold", "percent_reporting_threshold", default=100)
Expand Down
100 changes: 83 additions & 17 deletions src/elexmodel/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
from elexmodel.handlers.data.ModelResults import ModelResultsHandler
from elexmodel.handlers.data.PreprocessedData import PreprocessedDataHandler
from elexmodel.logging import initialize_logging
from elexmodel.models.BootstrapElectionModel import BootstrapElectionModel
from elexmodel.models.ConformalElectionModel import ConformalElectionModel
from elexmodel.models.GaussianElectionModel import GaussianElectionModel
from elexmodel.models.NonparametricElectionModel import NonparametricElectionModel
Expand Down Expand Up @@ -39,6 +40,7 @@ def __init__(self):
super().__init__()
self.all_conformalization_data_unit_dict = defaultdict(dict)
self.all_conformalization_data_agg_dict = defaultdict(dict)
self.model = None

def _check_input_parameters(
self,
Expand Down Expand Up @@ -88,8 +90,7 @@ def _check_input_parameters(
]
if len(invalid_fixed_effects) > 0:
raise ValueError(f"Fixed effect(s): {invalid_fixed_effects} not valid. Please check config")

if pi_method not in {"gaussian", "nonparametric"}:
if pi_method not in {"gaussian", "nonparametric", "bootstrap"}:
raise ValueError(
f"Prediction interval method: {pi_method} is not valid. \
pi_method has to be either `gaussian` or `nonparametric`."
Expand All @@ -102,6 +103,14 @@ def _check_input_parameters(
not isinstance(model_parameters["lambda_"], (float, int)) or model_parameters["lambda_"] < 0
):
raise ValueError("lambda is not valid. It has to be numeric and greater than zero.")
if "turnout_factor_lower" in model_parameters and not isinstance(
model_parameters["turnout_factor_lower"], float
):
raise ValueError("turnout_factor_lower is not valid. Has to be a float.")
if "turnout_factor_upper" in model_parameters and not isinstance(
model_parameters["turnout_factor_upper"], float
):
raise ValueError("turnout_factor_upper is not valid. Has to be a float.")
if pi_method == "gaussian":
if "beta" in model_parameters and not isinstance(model_parameters["beta"], (int, float)):
raise ValueError("beta is not valid. Has to be either an integer or a float.")
Expand All @@ -110,7 +119,45 @@ def _check_input_parameters(
elif pi_method == "nonparametric":
if "robust" in model_parameters and not isinstance(model_parameters["robust"], bool):
raise ValueError("robust is not valid. Has to be a boolean.")

elif pi_method == "bootstrap":
if "B" in model_parameters and not isinstance(model_parameters["B"], int):
raise ValueError("B is not valid. Has to be either an integer.")
if "T" in model_parameters and not isinstance(model_parameters["T"], (int, float)):
raise ValueError("T is not valid. Has to be either an integer or a float.")
if "strata" in model_parameters and not isinstance(model_parameters["strata"], list):
raise ValueError("strata is not valid. Has to be a list.")
if "agg_model_hard_threshold" in model_parameters and not isinstance(
model_parameters["agg_model_hard_threshold"], bool
):
raise ValueError("agg_model_hard_threshold is not valid. Has to be a boolean.")
if "y_LB" in model_parameters and not isinstance(model_parameters["y_LB"], float):
raise ValueError("y_LB is not valid. Has to be a float.")
if "y_UB" in model_parameters and not isinstance(model_parameters["y_UB"], float):
raise ValueError("y_UB is not valid. Has to be a float.")
if "z_LB" in model_parameters and not isinstance(model_parameters["z_LB"], float):
raise ValueError("z_LB is not valid. Has to be a float.")
if "z_UB" in model_parameters and not isinstance(model_parameters["z_UB"], float):
raise ValueError("z_UB is not valid. Has to be a float.")
if "y_unobserved_upper_bound" in model_parameters and not isinstance(
model_parameters["y_unobserved_upper_bound"], float
):
raise ValueError("y_unobserved_upper_bound is not valid. Has to be a float.")
if "y_unobserved_lower_bound" in model_parameters and not isinstance(
model_parameters["y_unobserved_lower_bound"], float
):
raise ValueError("y_unobserved_lower_bound is not valid. Has to be a float.")
if "percent_expected_vote_error_bound" in model_parameters and not isinstance(
model_parameters["percent_expected_vote_error_bound"], float
):
raise ValueError("z_UB is not valid. Has to be a float.")
if "z_unobserved_upper_bound" in model_parameters and not isinstance(
model_parameters["z_unobserved_upper_bound"], float
):
raise ValueError("z_unobserved_upper_bound is not valid. Has to be a float.")
if "z_unobserved_lower_bound" in model_parameters and not isinstance(
model_parameters["z_unobserved_lower_bound"], float
):
raise ValueError("z_unobserved_lower_bound is not valid. Has to be a float.")
if handle_unreporting not in {"drop", "zero"}:
raise ValueError("handle_unreporting must be either `drop` or `zero`")

Expand All @@ -122,6 +169,9 @@ def get_aggregate_list(self, office, aggregate):
raw_aggregate_list = base_aggregate + [aggregate]
return sorted(list(set(raw_aggregate_list)), key=lambda x: AGGREGATE_ORDER.index(x))

def get_national_summary_votes_estimates(self, nat_sum_data_dict=None, called_states={}, base_to_add=0, alpha=0.9):
return self.model.get_national_summary_estimates(nat_sum_data_dict, called_states, base_to_add, alpha=alpha)

def get_estimates(
self,
current_data, # list of lists
Expand Down Expand Up @@ -158,10 +208,15 @@ def get_estimates(
save_conformalization = "conformalization" in save_output
handle_unreporting = kwargs.get("handle_unreporting", "drop")

district_election = False
if office in {"H", "Y", "Z"}:
district_election = True

model_settings = {
"election_id": election_id,
"office": office,
"geographic_unit_type": geographic_unit_type,
"district_election": district_election,
"features": features,
"fixed_effects": fixed_effects,
"save_conformalization": save_conformalization,
Expand All @@ -172,6 +227,7 @@ def get_estimates(
config_handler = ConfigHandler(
election_id, config=raw_config, s3_client=s3.S3JsonUtil(TARGET_BUCKET), save=save_config
)

self._check_input_parameters(
config_handler,
office,
Expand All @@ -184,6 +240,7 @@ def get_estimates(
model_parameters,
handle_unreporting,
)

states_with_election = config_handler.get_states(office)
estimand_baselines = config_handler.get_estimand_baselines(office, estimands)

Expand Down Expand Up @@ -213,13 +270,18 @@ def get_estimates(
handle_unreporting=handle_unreporting,
)

turnout_factor_lower = model_parameters.get("turnout_factor_lower", 0.5)
turnout_factor_upper = model_parameters.get("turnout_factor_upper", 1.5)

reporting_units = data.get_reporting_units(
percent_reporting_threshold, features_to_normalize=features, add_intercept=True
percent_reporting_threshold, turnout_factor_lower, turnout_factor_upper
)
nonreporting_units = data.get_nonreporting_units(
percent_reporting_threshold, features_to_normalize=features, add_intercept=True
percent_reporting_threshold, turnout_factor_lower, turnout_factor_upper
)
unexpected_units = data.get_unexpected_units(
percent_reporting_threshold, aggregates, turnout_factor_lower, turnout_factor_upper
)
unexpected_units = data.get_unexpected_units(percent_reporting_threshold, aggregates)

LOG.info(
"Model parameters: \n prediction intervals: %s, percent reporting threshold: %s, \
Expand All @@ -232,13 +294,15 @@ def get_estimates(
)

if pi_method == "nonparametric":
model = NonparametricElectionModel(model_settings=model_settings)
self.model = NonparametricElectionModel(model_settings=model_settings)
elif pi_method == "gaussian":
model = GaussianElectionModel(model_settings=model_settings)
self.model = GaussianElectionModel(model_settings=model_settings)
elif pi_method == "bootstrap":
self.model = BootstrapElectionModel(model_settings=model_settings)

minimum_reporting_units_max = 0
for alpha in prediction_intervals:
minimum_reporting_units = model.get_minimum_reporting_units(alpha)
minimum_reporting_units = self.model.get_minimum_reporting_units(alpha)
if minimum_reporting_units > minimum_reporting_units_max:
minimum_reporting_units_max = minimum_reporting_units

Expand Down Expand Up @@ -270,24 +334,26 @@ def get_estimates(
)

for estimand in estimands:
unit_predictions = model.get_unit_predictions(reporting_units, nonreporting_units, estimand)
unit_predictions = self.model.get_unit_predictions(
reporting_units, nonreporting_units, estimand, unexpected_units=unexpected_units
)
results_handler.add_unit_predictions(estimand, unit_predictions)
# gets prediciton intervals for each alpha
alpha_to_unit_prediction_intervals = {}
for alpha in prediction_intervals:
alpha_to_unit_prediction_intervals[alpha] = model.get_unit_prediction_intervals(
alpha_to_unit_prediction_intervals[alpha] = self.model.get_unit_prediction_intervals(
results_handler.reporting_units, results_handler.nonreporting_units, alpha, estimand
)
if isinstance(model, ConformalElectionModel):
if isinstance(self.model, ConformalElectionModel):
self.all_conformalization_data_unit_dict[alpha][
estimand
] = model.get_all_conformalization_data_unit()
] = self.model.get_all_conformalization_data_unit()

results_handler.add_unit_intervals(estimand, alpha_to_unit_prediction_intervals)

for aggregate in results_handler.aggregates:
aggregate_list = self.get_aggregate_list(office, aggregate)
estimates_df = model.get_aggregate_predictions(
estimates_df = self.model.get_aggregate_predictions(
results_handler.reporting_units,
results_handler.nonreporting_units,
results_handler.unexpected_units,
Expand All @@ -296,7 +362,7 @@ def get_estimates(
)
alpha_to_agg_prediction_intervals = {}
for alpha in prediction_intervals:
alpha_to_agg_prediction_intervals[alpha] = model.get_aggregate_prediction_intervals(
alpha_to_agg_prediction_intervals[alpha] = self.model.get_aggregate_prediction_intervals(
results_handler.reporting_units,
results_handler.nonreporting_units,
results_handler.unexpected_units,
Expand All @@ -305,10 +371,10 @@ def get_estimates(
alpha_to_unit_prediction_intervals[alpha],
estimand,
)
if isinstance(model, ConformalElectionModel):
if isinstance(self.model, ConformalElectionModel):
self.all_conformalization_data_agg_dict[alpha][
estimand
] = model.get_all_conformalization_data_agg()
] = self.model.get_all_conformalization_data_agg()

# get all of the prediction intervals here
results_handler.add_agg_predictions(
Expand Down
11 changes: 10 additions & 1 deletion src/elexmodel/handlers/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,11 +69,15 @@ def get_estimand_baselines(self, office, estimands):
Return dict of baseline pointers for requested estimands
"""
baseline_pointers = {estimand: self.get_baseline_pointer(office).get(estimand) for estimand in estimands}
if "margin" in estimands:
baseline_pointers["margin"] = "margin"
return baseline_pointers

def get_estimands(self, office):
baseline_pointer = self.get_baseline_pointer(office)
estimands = list(baseline_pointer.keys())
if self.election_id.endswith("G"):
estimands += ["margin"] # would otherwise need to add margin to every single config
return estimands

def get_states(self, office):
Expand All @@ -92,7 +96,12 @@ def get_geographic_unit_types(self, office):
return self._get_office_subconfig(office).get("geographic_unit_types")

def get_features(self, office):
return self._get_office_subconfig(office).get("features", [])
features = self._get_office_subconfig(office).get("features", [])
if self.election_id.endswith("G"):
features += [
"baseline_normalized_margin"
] # would otherwise need to add baseline_margin to every single config
dmnapolitano marked this conversation as resolved.
Show resolved Hide resolved
return features

def get_aggregates(self, office):
return self._get_office_subconfig(office).get("aggregates", [])
Expand Down
Loading
Loading