Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increasing epsilon decreases noise #15

Open
echo66 opened this issue Dec 11, 2018 · 4 comments
Open

Increasing epsilon decreases noise #15

echo66 opened this issue Dec 11, 2018 · 4 comments

Comments

@echo66
Copy link

echo66 commented Dec 11, 2018

Greetings!

I'm trying to understand your paper and implementation. I've noticed that the more you increase epsilon, the less noise will be generated. In order to understand if that is the expected behavior, I looked into your paper and PrivBayes paper (and, also, a Java implementation) and everyone seems to say that the scale of the noise is given by:

4 * (n_cols - k) / (n_rows * epsilon)

But the definition of differential privacy implies that if epsilon gets closer to 0, there won't be any difference for the query output between the original and synthetic datasets. Am I getting something wrong?

Thanks in advance!

@haoyueping
Copy link
Collaborator

Thanks for your question! I will update the documentation for epsilon soon.

There are several concepts in Differential Privacy (DP):

  1. an original dataset D1
  2. a neighboring dataset D2
  3. a randomized algorithm A
  4. an output O
  5. a small value epsilon

DP requires that

Under the context of DataSynthesizer, D1 is the input dataset, O is the synthetic dataset.

When epsilon=0, Pr(A(D1)=O) = Pr(A(D2)=O). There is no difference between D1 and D2 for A, then A is fully randomized. So reducing epsilon injects more noises.

@eunbeejang
Copy link

eunbeejang commented Nov 4, 2020

@haoyueping

Hello, I'm also trying to understand your work. If you say increased epsilon translates to reducing noise, then what epsilon value will be equivalent to having no DP?

@haoyueping
Copy link
Collaborator

Hi @eunbeejang ,

The noise required by DP is nearly 0 when the epsilon value is infinity.

But in the implementation of DataSythesizer, the DP is turned off if epsilon=0

@amdjedbens
Copy link

amdjedbens commented Apr 9, 2024

But in the implementation of DataSythesizer, the DP is turned off if epsilon=0.

So with that being said, and in simpler terms, within your practical implementation of DataSynthetizer, you don't strictly adhere to the definition of differential privacy.
When epsilon is higher, less noise is added, and vice versa. However, when epsilon=0, it means no noise is added at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants