MiDiPSA-for-non-stationary-streams

MiDiPSA (Microaggregation-based Differential Private Stream Anonymization) for continuously publishing non-stationary data. The algorithm satisfies k−anonymity, (c, l)−diversity and adhering to the conditions of ϵ−differential-privacy, combined with an unsupervised mechanism for detection of concept drift. The algorithm evaluates the trade-off between privacy, measured by the disclosure risk, and utility, measured by the AUC of MOA stream classifiers trained on the anonymized streams (such as MajorityClass, HoeffdingAdaptiveTree, NaiveBayes, SGD and AdaptiveRandomForest).

The clustering process is illustrated below:

Requirements:

Latest version of MOA installed (http://moa.cms.waikato.ac.nz). moa.jar file available in the directory of the project.
Python 2.7 installed. Anaconda is preferred, since it installs Numpy, Pandas and Scipy packages automatically.
Datasets (example is provided): data CSV file and a metadata file including list of all attributes and their type and range.

Instructions:

Run Program.py file with the following parameters:
- DIR - directory of the datasets and metadata files.
- stream_path - name the dataset file.
- datatypes_path - name of the metadata file.
- k - range of cluster size and k-anonymity parameter [k_min - k_max].
- l - l-diversity parameter.
- c - c parameter of recursive (c,l)-diversity (default 7).
- eps - differential privacy parameter.
- b - buffer size of each cluster (default 3k, where k is k-anonymity parameter).
- delta - delay threshold parameter (default 10k).
- dist_thr - distance threshold for choosing the nearest cluster in the clustering (should be tuned for each dataset).
- cd_conf - confidence level of statistical test for detection of concept drift (default 0.1).
- noise_thr - threshold for controlling noise addition to categorical attributes, as part of differential privacy mechanism.
Log file is saved for the process in the project directory.
Original data and its corresponding anonymization (with the classification performance) are saved in the project 'Output' directory.
Report is produced as a CSV file in the project directory.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Anonymizer		Anonymizer
Buffer		Buffer
ConceptDriftHandler		ConceptDriftHandler
Datasets_small		Datasets_small
Evaluator		Evaluator
ExternalLib		ExternalLib
Instances		Instances
Noiser		Noiser
Output		Output
PerformanceEstimators		PerformanceEstimators
Publisher		Publisher
Reports		Reports
StreamHandler		StreamHandler
Utils		Utils
.gitignore		.gitignore
Cluster.py		Cluster.py
MiDiPDSA_illustration_snapshot.png		MiDiPDSA_illustration_snapshot.png
Program.py		Program.py
README.md		README.md
moa.jar		moa.jar
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MiDiPSA-for-non-stationary-streams

Requirements:

Instructions:

About

Releases

Packages

Languages

strmprivacy/MiDiPSA-for-non-stationary-streams

Folders and files

Latest commit

History

Repository files navigation

MiDiPSA-for-non-stationary-streams

Requirements:

Instructions:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages