You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue will be used to plan updates for disclosure risk metrics in syntheval
Confidential data baseline assessments
Methods for identifying existing confidential records with high disclosure risk (edit: now in disc_baseline.R
Methods for identifying arbitrary records worth evaluating in holdouts (edit: deferred to 0.0.5)
disc_baseline_lra(conf_tables): linear reconstruction attack from a collection of count tables (link)
disc_baseline_make_canaries(conf_data): create artificial high-risk records for holdout data (e.g., "canaries" (link)
Membership inferences from synthetic data
Quasi-identifier probabilistic membership inference (edit: added in disc_qid_mi.R)
Partition selection probabilities from multiple replicates
Membership empirical intervals from multiple replicates
Membership inference updates for arbitrarily holdouts (link)
disc_mit(...) updates for multiple synthetic data replicates
disc_mit(...) updates for disaggregated records
disc_mit(...) updates for mechanism adaptivity (edit: deferred to 0.0.5)
Linkage attacks (edit: deferred to 0.0.5)
disc_linkage_recon(synth_data, recon): Linkage attack from synthetic data and partial reconstruction
Attribute inferences
disc_ait(synth_data, test_records): attribute inference for test_records using synthetic data-based models
disc_ait_compare(synth_data, test_records, holdout_data): attribute inference for test_records comparing differences between using synthetic and holdout data (link)
The text was updated successfully, but these errors were encountered:
I would appreciate functionality/best practices for working with continuous variables and mixed-type data.
Membership inferences from synthetic data
Can you share a little more detail about the linkage attack functionality?
I can imagine major differences between methods for partially and fully synthetic data.
What is the direction of the linkage?
Attribute inferences
I've done some crude work on this. Let me know how I can help! The discriminator workflow we added is pretty flexible and leverages library(tidymodels).
@awunderground I updated this roadmap based on what was merged in. If you have some crude work done already on attribute inferences, any chance you'd be willing to add it to a branch? I can massage it to work with the 0.0.4 updates; I think this will be pretty flexible since it should probably take a tidymodels workflow as input
Disclosure risk metrics planning
This issue will be used to plan updates for disclosure risk metrics in
syntheval
Confidential data baseline assessments
disc_baseline.R
disc_baseline_lra(conf_tables)
: linear reconstruction attack from a collection of count tables (link)disc_baseline_make_canaries(conf_data)
: create artificial high-risk records for holdout data (e.g., "canaries" (link)Membership inferences from synthetic data
disc_qid_mi.R
)disc_mit(...)
updates for multiple synthetic data replicatesdisc_mit(...)
updates for disaggregated recordsdisc_mit(...)
updates for mechanism adaptivity (edit: deferred to 0.0.5)disc_linkage_recon(synth_data, recon)
: Linkage attack from synthetic data and partial reconstructionAttribute inferences
disc_ait(synth_data, test_records)
: attribute inference fortest_records
using synthetic data-based modelsdisc_ait_compare(synth_data, test_records, holdout_data)
: attribute inference fortest_records
comparing differences between using synthetic and holdout data (link)The text was updated successfully, but these errors were encountered: