-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements and Bug Fixes for Probabilistic Fairness #27
Merged
+286
−188
Merged
Changes from 25 commits
Commits
Show all changes
29 commits
Select commit
Hold shift + click to select a range
76dade3
Add test for get_all_scores
mthielbar 2c399b8
Bug fix. EqualOpportunity should be included in get_all_scores.
mthielbar 6073318
Small updates to test_utils_proba.py
mthielbar 7f8ecde
Rearrange simulation into its own class.
mthielbar da13e4f
Simulator is its own class. Simulator unit tests running clean.
mthielbar 04d0c63
Small edits to test_utils_proba.py
mthielbar 25fc7fd
Fix small bug that occurs in summarizer when mambership_df has a surr…
mthielbar fa8f8bc
Add tests for summarizer.
mthielbar 5ee2d0b
Incorporate fixes to summarizer.
mthielbar 2ea6660
Merge branch 'summarizer_bug' into prob_membership_updates
mthielbar 899c747
Cleanup code after merging changes to fix summarizer bug.
mthielbar 6e0a826
run_bootstrap was using incorrect class label function call.
mthielbar 4157bd4
Merge branch 'prob_membership_updates' into update_simulation
mthielbar 15395ac
Clean up print statements in is_one_dimensional.
mthielbar 325d123
Clean up deprecation warning caused by cvx.Variable returning a one-d…
mthielbar 9f195d6
Turn off user warnings where possible in test_utils_proba.py. Warning…
mthielbar c05ae6e
Update to utils_proba.py
mthielbar c2401d9
Edit comments in simulator.
mthielbar 5a18d04
Merge code for simulator class with fixes.
mthielbar 721cd3e
Update minimum weight to 5 rows, according to results from simulation…
mthielbar 13baf31
Make simulation dataframe large enough so values are not unstable and…
mthielbar 50ff34e
Add simulation scripts and readme.md for probabilistic fairness.
mthielbar 5971028
Update comments and readme.md
mthielbar 83f1782
Add descriptions and citations to readme
mthielbar 377756a
Add input data for simulations and supporting notebooks to create out…
mthielbar 1adfb48
update
skadio 4830f9b
update
skadio 02f0497
update
skadio 8897d66
update
skadio File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
3,589 changes: 3,589 additions & 0 deletions
3,589
examples/probabilistic_fairness/input_data/sampled_surrogate_inputs.csv
Large diffs are not rendered by default.
Oops, something went wrong.
33,185 changes: 33,185 additions & 0 deletions
33,185
examples/probabilistic_fairness/input_data/surrogate_inputs.csv
Large diffs are not rendered by default.
Oops, something went wrong.
1,909 changes: 1,909 additions & 0 deletions
1,909
examples/probabilistic_fairness/notebooks/analyze_prob_vs_model.ipynb
Large diffs are not rendered by default.
Oops, something went wrong.
5,209 changes: 5,209 additions & 0 deletions
5,209
examples/probabilistic_fairness/notebooks/analyze_sample_size_sim.ipynb
Large diffs are not rendered by default.
Oops, something went wrong.
62 changes: 62 additions & 0 deletions
62
examples/probabilistic_fairness/python_scripts/simulation.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
#Simulations | ||
import pandas as pd | ||
import numpy as np | ||
import math | ||
import sys | ||
sys.path.append('../../jurity/tests') | ||
sys.path.append('../../jurity/jurity') | ||
from jurity.fairness import BinaryFairnessMetrics as bfm | ||
from test_utils_proba import UtilsProbaSimulator | ||
|
||
output_path='~/Documents/data/jurity_tests/simulations/' | ||
|
||
testing_simulation=False | ||
n_runs=30 | ||
avg_counts=[30,50] | ||
fair_sim=UtilsProbaSimulator({'not_protected': {'pct_positive': 0.2, 'fnr': 0.1, 'fpr': 0.2},'protected': {'pct_positive': 0.2, 'fnr': 0.1, 'fpr': 0.2}},surrogate_name="ZIP") | ||
slightly_unfair_sim=UtilsProbaSimulator({'not_protected': {'pct_positive': 0.2, 'fnr': 0.1, 'fpr': 0.2}, 'protected': {'pct_positive': 0.1, 'fnr': 0.35, 'fpr': 0.1}},surrogate_name="ZIP") | ||
moderately_unfair_sim=UtilsProbaSimulator({'not_protected': {'pct_positive': 0.3, 'fnr': 0.1, 'fpr': 0.3}, 'protected': {'pct_positive': 0.1, 'fnr': 0.45, 'fpr': 0.1}},surrogate_name="ZIP") | ||
very_unfair_sim =UtilsProbaSimulator({'not_protected': {'pct_positive': 0.4, 'fnr': 0.1, 'fpr': 0.3}, 'protected': {'pct_positive': 0.10, 'fnr': 0.65, 'fpr': 0.1}},surrogate_name="ZIP") | ||
extremely_unfair_sim =UtilsProbaSimulator({'not_protected': {'pct_positive': 0.5, 'fnr': 0.1, 'fpr': 0.2}, 'protected': {'pct_positive': 0.10, 'fnr': 0.65, 'fpr': 0.05}},surrogate_name="ZIP") | ||
|
||
scenarios={"fair":fair_sim, | ||
"slightly_unfair":slightly_unfair_sim, | ||
"moderately_unfair":moderately_unfair_sim, | ||
"very_unfair":very_unfair_sim, | ||
"extremely_unfair":extremely_unfair_sim} | ||
surrogates=pd.read_csv('../input_data/surrogate_inputs.csv') | ||
if testing_simulation: | ||
output_string = output_path+'{0}_simulation_count_{1}_surrogates_{2}_test.csv' | ||
else: | ||
output_string = output_path+'{0}_simulation_count_{1}_surrogates_{2}.csv' | ||
|
||
def run_one_sim(simulator, membership_df,count_mean,rng=np.random.default_rng()): | ||
membership_df["count"]=pd.Series(rng.poisson(lam=count_mean,size=membership_df.shape[0])) | ||
test_data=simulator.explode_dataframe(membership_df) | ||
oracle_metrics=bfm.get_all_scores(test_data["label"].values,test_data["prediction"].values, | ||
(test_data["class"]=="protected").astype(int).values).rename(columns={"Value":"oracle_value"}) | ||
prob_metrics=bfm.get_all_scores(test_data["label"],test_data["prediction"], | ||
membership_df.set_index("ZIP")[["not_protected","protected"]], | ||
test_data["ZIP"],[1]).rename(columns={"Value":"probabilistic_estimate"}) | ||
predicted_class=test_data[["not_protected","protected"]].values.tolist() | ||
argmax_metrics=bfm.get_all_scores(test_data["label"].values,test_data["prediction"].values, | ||
predicted_class).rename(columns={"Value":"argmax_estimate"}) | ||
return pd.concat([oracle_metrics["oracle_value"],prob_metrics["probabilistic_estimate"], argmax_metrics["argmax_estimate"]], axis=1) | ||
|
||
if __name__=="__main__": | ||
n_surrogates=surrogates.shape[0] | ||
for sim_label,simulator in scenarios.items(): | ||
for c in avg_counts: | ||
all_results=[] | ||
for i in range(0, n_runs): | ||
if testing_simulation: | ||
output_df = run_one_sim(simulator, surrogates.head(10), c) | ||
else: | ||
output_df = run_one_sim(simulator, surrogates, c) | ||
output_df["run_id"] = i | ||
all_results.append(output_df) | ||
all_output=pd.concat(all_results) | ||
all_output["average_count"] = c | ||
all_output["simulation"] = sim_label | ||
all_output["n_surrogates"] = n_surrogates | ||
all_output[~(all_output["probabilistic_estimate"].apply(np.isnan))].to_csv(output_string.format(sim_label, c,n_surrogates)) |
144 changes: 144 additions & 0 deletions
144
examples/probabilistic_fairness/python_scripts/simulation_compare_to_model.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,144 @@ | ||
#Simulationed data: Model-based assignment to protected class vs probabilistic fairness | ||
# One of the claims in the paper is that model-based fairness metrics are biased, | ||
# and that the degree of bias is a function of the PPV (positive predictive value/precision) | ||
# and NPV (negative predctive value) of the models that predicts protected status. | ||
# This simulation demonstrates the difference between probabilistic estimates and | ||
# model-based estimates for a given input data file (located in ../input_data.surrogate_inputscsv) | ||
|
||
import pandas as pd | ||
import numpy as np | ||
import math | ||
import sys | ||
sys.path.append('../../tests') | ||
sys.path.append('../../jurity') | ||
from jurity.fairness import BinaryFairnessMetrics as bfm | ||
from constants import Constants | ||
from sklearn.metrics import confusion_matrix | ||
from test_utils_proba import UtilsProbaSimulator | ||
|
||
def performance_measures(ground_truth: np.ndarray, | ||
predictions: np.ndarray) -> dict: | ||
"""Compute various performance measures, optionally conditioned on protected attribute. | ||
Assume that positive label is encoded as 1 and negative label as 0. | ||
|
||
Parameters | ||
--------- | ||
ground_truth: np.ndarray | ||
Ground truth labels (1/0). | ||
predictions: np.ndarray | ||
Predicted values. | ||
group_idx: Union[np.ndarray, List] | ||
Indices of the group to consider. Optional. | ||
group_membership: bool | ||
Restrict performance measures to members of a certain group. | ||
If None, the whole population is used. | ||
Default value is False. | ||
|
||
Returns | ||
--------- | ||
Dictionary with performance measure identifiers as keys and their corresponding values. | ||
""" | ||
tn, fp, fn, tp = confusion_matrix(ground_truth, predictions).ravel() | ||
|
||
p = np.sum(ground_truth == 1) | ||
n = np.sum(ground_truth == 0) | ||
|
||
return {Constants.TPR: tp / p, | ||
Constants.TNR: tn / n, | ||
Constants.FPR: fp / n, | ||
Constants.FNR: fn / p, | ||
Constants.PPV: tp / (tp + fp) if (tp + fp) > 0.0 else Constants.float_null, | ||
Constants.NPV: tn / (tn + fn) if (tn + fn) > 0.0 else Constants.float_null, | ||
Constants.FDR: fp / (fp + tp) if (fp + tp) > 0.0 else Constants.float_null, | ||
Constants.FOR: fn / (fn + tn) if (fn + tn) > 0.0 else Constants.float_null, | ||
Constants.ACC: (tp + tn) / (p + n) if (p + n) > 0.0 else Constants.float_null} | ||
|
||
#If true, only simulate a small dataframe. Used to test simulation syntax. | ||
testing_simulation=False | ||
n_runs=30 | ||
|
||
# The test_utils_proba.py test file in jurity/tests contains a class called | ||
# UtilsProbaSimulator, which can simulate the confusion matrix from an unfair model for different classes. | ||
# Simulation is explained in : | ||
|
||
fair_sim=UtilsProbaSimulator({'not_protected': {'pct_positive': 0.2, 'fnr': 0.1, 'fpr': 0.2},'protected': {'pct_positive': 0.2, 'fnr': 0.1, 'fpr': 0.2}},surrogate_name="ZIP") | ||
slightly_unfair_sim=UtilsProbaSimulator({'not_protected': {'pct_positive': 0.2, 'fnr': 0.1, 'fpr': 0.2}, 'protected': {'pct_positive': 0.1, 'fnr': 0.35, 'fpr': 0.1}},surrogate_name="ZIP") | ||
moderately_unfair_sim=UtilsProbaSimulator({'not_protected': {'pct_positive': 0.3, 'fnr': 0.1, 'fpr': 0.3}, 'protected': {'pct_positive': 0.1, 'fnr': 0.45, 'fpr': 0.1}},surrogate_name="ZIP") | ||
very_unfair_sim =UtilsProbaSimulator({'not_protected': {'pct_positive': 0.4, 'fnr': 0.1, 'fpr': 0.3}, 'protected': {'pct_positive': 0.10, 'fnr': 0.65, 'fpr': 0.1}},surrogate_name="ZIP") | ||
extremely_unfair_sim =UtilsProbaSimulator({'not_protected': {'pct_positive': 0.5, 'fnr': 0.1, 'fpr': 0.2}, 'protected': {'pct_positive': 0.10, 'fnr': 0.65, 'fpr': 0.05}},surrogate_name="ZIP") | ||
if testing_simulation: | ||
scenarios = {"moderately_unfair": moderately_unfair_sim, | ||
"very_unfair":very_unfair_sim} | ||
else: | ||
scenarios={"fair":fair_sim, | ||
"slightly_unfair":slightly_unfair_sim, | ||
"moderately_unfair":moderately_unfair_sim, | ||
"very_unfair":very_unfair_sim, | ||
"extremely_unfair":extremely_unfair_sim} | ||
#Location of input and output files | ||
surrogates=pd.read_csv('../input_data/sampled_surrogate_inputs.csv') | ||
if testing_simulation: | ||
prob_output_string = '~/Documents/data/jurity_tests/simulations//model_v_prob/{0}_prob_simulation_{1}_surrogates_{2}_count_test.csv' | ||
model_output_string = '~/Documents/data/jurity_tests/simulations/model_v_prob/{0}_model_simulation_{1}_surrogates_{2}_count_test.csv' | ||
else: | ||
prob_output_string = '~/Documents/data/jurity_tests/simulations/model_v_prob/{0}_prob_simulation_{1}_surrogates_{2}_count.csv' | ||
model_output_string = '~/Documents/data/jurity_tests/simulations/model_v_prob/{0}_model_simulation_{1}_surrogates_{2}_count.csv' | ||
|
||
def generate_test_data(simulator, membership_df,count_mean,rng=np.random.default_rng()): | ||
membership_df["count"]=pd.Series(rng.poisson(lam=count_mean,size=membership_df.shape[0])) | ||
return simulator.explode_dataframe(membership_df) | ||
|
||
def calc_prob_estimate(test_data,membership_df): | ||
oracle_metrics=bfm.get_all_scores(test_data["label"].values,test_data["prediction"].values, | ||
(test_data["class"]=="protected").astype(int).values).rename(columns={"Value":"oracle_value"}) | ||
prob_metrics=bfm.get_all_scores(test_data["label"],test_data["prediction"], | ||
membership_df.set_index("ZIP")[["not_protected","protected"]], | ||
test_data["ZIP"],[1]).rename(columns={"Value":"probabilistic_estimate"}) | ||
return pd.concat([oracle_metrics["oracle_value"],prob_metrics["probabilistic_estimate"]], axis=1) | ||
|
||
def calc_model_estimate(df,rng=np.random.default_rng()): | ||
out_dfs=[] | ||
for s in [[0.99, 0.99], [0.9, 0.99], [0.8, 0.9], [0.7, 0.8]]: | ||
p_given_p = s[0] | ||
np_given_np = s[1] | ||
prediction_p=rng.choice([0,1],p=[1-p_given_p,p_given_p],size=df.shape[0]) | ||
prediction_np=rng.choice([0,1],p=[np_given_np,1-np_given_np],size=df.shape[0]) | ||
class_vec_p=(df["class"]=="protected").astype(int).values | ||
class_vec_np=(df["class"]=="not_protected").astype(int).values | ||
class_pred=np.multiply(class_vec_p,prediction_p)+np.multiply(class_vec_np,prediction_np) | ||
scores=bfm.get_all_scores(df["label"].values,df["prediction"].values,class_pred).rename(columns={"Value":"model_estimate"}) | ||
scores["p_given_p"]=p_given_p | ||
scores["np_given_np"]=np_given_np | ||
class_model_performance=performance_measures(class_vec_p,class_pred) | ||
scores["p_given_p"]=p_given_p | ||
scores["np_given_np"]=np_given_np | ||
scores["class_PPV"]=class_model_performance[Constants.PPV] | ||
scores["class_NPV"]=class_model_performance[Constants.NPV] | ||
scores["class_TPR"]=class_model_performance[Constants.TPR] | ||
scores["class_BR"]=np.sum(class_vec_p) | ||
out_dfs.append(scores.reset_index()[["Metric","model_estimate","class_PPV","class_NPV","class_TPR","p_given_p","np_given_np"]]) | ||
return pd.concat(out_dfs,axis=0) | ||
|
||
if __name__=="__main__": | ||
n_surrogates=surrogates.shape[0] | ||
generator=np.random.default_rng() | ||
for sim_label,simulator in scenarios.items(): | ||
prob_results=[] | ||
model_results=[] | ||
for i in range(0, n_runs): | ||
if testing_simulation: | ||
test_df = generate_test_data(simulator, surrogates, 50, generator) | ||
else: | ||
test_df = generate_test_data(simulator, surrogates, 50, generator) | ||
p=calc_prob_estimate(test_df,surrogates) | ||
p["run_id"]=i | ||
prob_results.append(p) | ||
m=calc_model_estimate(test_df, generator) | ||
m["run_id"]=i | ||
model_results.append(m) | ||
all_prob_results=pd.concat(prob_results,axis=0) | ||
all_prob_results["simulation"]=sim_label | ||
all_model_results=pd.concat(model_results,axis=0) | ||
all_model_results["simulation"]=sim_label | ||
all_prob_results.to_csv(prob_output_string.format(sim_label,50,surrogates.shape[0])) | ||
all_model_results.to_csv(model_output_string.format(sim_label,50,surrogates.shape[0]),index=False) |
98 changes: 98 additions & 0 deletions
98
examples/probabilistic_fairness/python_scripts/simulation_counts.py
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,98 @@ | ||
#Simulation inspecting probabilistic fairness performance for different sample sizes. | ||
import pandas as pd | ||
import numpy as np | ||
import math | ||
import sys | ||
sys.path.append('../../jurity/tests') | ||
sys.path.append('../../jurity/jurity') | ||
from jurity.fairness import BinaryFairnessMetrics as bfm | ||
from test_utils_proba import UtilsProbaSimulator | ||
output_path='~/Documents/data/jurity_tests/simulations/sample_size/min_weight_0/' | ||
testing_simulation=False | ||
n_runs=30 | ||
avg_counts=[5,10,20,30,40] | ||
num_surrogates=[50,100,300,400,500,1000] | ||
fair_sim=UtilsProbaSimulator({'not_protected': {'pct_positive': 0.2, 'fnr': 0.1, 'fpr': 0.2},'protected': {'pct_positive': 0.2, 'fnr': 0.1, 'fpr': 0.2}},surrogate_name="ZIP") | ||
slightly_unfair_sim=UtilsProbaSimulator({'not_protected': {'pct_positive': 0.2, 'fnr': 0.1, 'fpr': 0.2}, 'protected': {'pct_positive': 0.1, 'fnr': 0.35, 'fpr': 0.1}},surrogate_name="ZIP") | ||
moderately_unfair_sim=UtilsProbaSimulator({'not_protected': {'pct_positive': 0.3, 'fnr': 0.1, 'fpr': 0.3}, 'protected': {'pct_positive': 0.1, 'fnr': 0.45, 'fpr': 0.1}},surrogate_name="ZIP") | ||
very_unfair_sim =UtilsProbaSimulator({'not_protected': {'pct_positive': 0.4, 'fnr': 0.1, 'fpr': 0.3}, 'protected': {'pct_positive': 0.10, 'fnr': 0.65, 'fpr': 0.1}},surrogate_name="ZIP") | ||
extremely_unfair_sim =UtilsProbaSimulator({'not_protected': {'pct_positive': 0.5, 'fnr': 0.1, 'fpr': 0.2}, 'protected': {'pct_positive': 0.10, 'fnr': 0.65, 'fpr': 0.05}},surrogate_name="ZIP") | ||
|
||
scenarios={"fair":fair_sim, | ||
"slightly_unfair":slightly_unfair_sim, | ||
"moderately_unfair":moderately_unfair_sim, | ||
"very_unfair":very_unfair_sim, | ||
"extremely_unfair":extremely_unfair_sim} | ||
surrogates=pd.read_csv('./supporting_data/surrogate_inputs.csv') | ||
surrogates["ZIP"]=surrogates["ZIP"].astype(int) | ||
if testing_simulation: | ||
output_string = output_path+'{0}_simulation_count_{1}_test_surrogates_{2}.csv' | ||
else: | ||
output_string = output_path+'sample_size/min_weight_0/{0}_simulation_count_{1}_surrogates_{2}.csv' | ||
|
||
def run_one_sim(test_data,membership_df): | ||
#Sometimes the sub-sampling leads to data errors. | ||
#Return a dataframe that is all nans in this case. | ||
#Keep track--if there are too many of these, stop the simulation | ||
global n_errors | ||
try: | ||
oracle_metrics=bfm.get_all_scores(test_data["label"].values,test_data["prediction"].values, | ||
(test_data["class"]=="protected").astype(int).values).rename(columns={"Value":"oracle_value"}) | ||
except: | ||
oracle_metrics=pd.DataFrame({"oracle_value":[np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan]}, | ||
index=['Average Odds', 'Disparate Impact', 'Equal Opportunity', | ||
'FNR difference', 'FOR difference', 'Generalized Entropy Index', | ||
'Predictive Equality', 'Statistical Parity', 'Theil Index'] | ||
) | ||
n_errors=n_errors+1 | ||
try: | ||
prob_metrics=bfm.get_all_scores(test_data["label"],test_data["prediction"], | ||
membership_df.set_index("ZIP")[["not_protected","protected"]], | ||
test_data["ZIP"],[1]).rename(columns={"Value":"probabilistic_estimate"}) | ||
except: | ||
prob_metrics=pd.DataFrame({"probabilistic_estimate": [np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan]}, | ||
index=['Average Odds', 'Disparate Impact', 'Equal Opportunity', | ||
'FNR difference', 'FOR difference', 'Generalized Entropy Index', | ||
'Predictive Equality', 'Statistical Parity', 'Theil Index'] | ||
) | ||
n_errors=n_errors+1 | ||
return pd.concat([oracle_metrics["oracle_value"],prob_metrics["probabilistic_estimate"]], axis=1) | ||
|
||
if __name__=="__main__": | ||
n_errors=0 | ||
rng=np.random.default_rng() | ||
for sim_label,simulator in scenarios.items(): | ||
for c in avg_counts: | ||
surrogates["count"]=pd.Series(rng.poisson(lam=c,size=surrogates.shape[0])) | ||
if testing_simulation: | ||
test_data=simulator.explode_dataframe(surrogates.head(10)) | ||
else: | ||
test_data=simulator.explode_dataframe(surrogates) | ||
print("The number of rows in the data data is: ",test_data.shape) | ||
for n_surrogates in num_surrogates: | ||
all_results = [] | ||
for i in range(0, n_runs): | ||
#Sample surrogate classes from the dataframe | ||
#Take a sample stratified by p(protected) to get a spread | ||
#along the x axis for the regression | ||
if testing_simulation: | ||
sampled_surrogates=surrogates.head(10)["ZIP"].values | ||
else: | ||
sampled_surrogates=surrogates.groupby("bin").sample(frac=(n_surrogates/surrogates.shape[0]), | ||
replace=True)["ZIP"].values | ||
#only feed sampled surrogate classes into simulation | ||
a=test_data["ZIP"].apply(lambda x: x in sampled_surrogates).values | ||
b=surrogates["ZIP"].apply(lambda x:x in sampled_surrogates).values | ||
input_data=test_data.iloc[a].copy(deep=True) | ||
input_surrogates=surrogates.iloc[b].copy(deep=True) | ||
output_df=run_one_sim(input_data,input_surrogates) | ||
if n_errors>30: | ||
print("Errors limit reached. Stopping simulation.") | ||
break | ||
output_df["run_id"] = i | ||
all_results.append(output_df) | ||
all_output=pd.concat(all_results) | ||
all_output["average_count"] = c | ||
all_output["n_surrogates"] = n_surrogates | ||
all_output["simulation"] = sim_label | ||
all_output[~(all_output["probabilistic_estimate"].apply(np.isnan))].to_csv(output_string.format(sim_label, c,n_surrogates)) |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
quick q on the "data". Where is this data coming from? Is there an original version that we borrow from somewhere else (hence, copyright?) OR .. this data is generated by us?