Bounded sum with known n #5

silviacasac · 2021-07-14T22:17:25Z

September 24, 2021, upload

Several cycles of revisions are now complete. Adjustments in response to feedback from Salil and Marco have now been made. As far as I can tell, there is no outstanding feedback to incorporate.

Major note: this proof is only for integer types. Also, the Rust function is not fully in line with the function we specify in the pseudocode. For example, I believe the Rust version has issues with integer truncation in the stability relation, and with overflows. As far as we can tell, the pseudocode version does not.

September 21, 2021, upload

New points of discussion when revising:

What is the best way to specify/impose that the stability relation that we have (and hence its corresponding proof) only works for integers?
Changing the stability relation to 2*d_out >= d_in*(U-L) instead of d_out >= d_in*(U-L)/2 in order to avoid division errors?
Mismatches between the proof and the pseudocode:
(a) Imposing that we only accept integers.
(b) Changing the stability relation to2*d_out >= d_in*(U-L).
(c) Checking overflow for 2*d_out and d_in.
(d) Naming mismatch between n in the pseudocode+proor vs size in the code.
Adding some more explanation for the exact_int_cast check.
Is the precondition L<U or L<=U?

(Old -- from July):

First pull request for the Bounded sum with known n transformation, which corresponds to this GitHub permalink.

Files:

BoundedSumKnownN.tex -- the LaTeX document with the proof.
BoundedSumKnownN.pdf -- the PDF generated from the TeX document.

BoundedSumKnownN.tex

bounded_sum_known_n/BoundedSumKnownN.tex

silviacasac · 2021-08-05T21:30:58Z

Remaining things to check:

Whether it is OK that the castings to not appear in the proof.
Important. Floating point issues both in the addition of the elements and in the division. We are working on it.

bounded_sum_known_n/BoundedSumKnownN.tex

SalilVadhan · 2021-09-23T02:03:27Z

bounded_sum_known_n/BoundedSumKnownN.tex

+        raise Exception('Potential overflow') |\label{line:max}|
+
+    def relation(d_in: u32, d_out: T) -> bool: |\label{line:rel}|
+        if checked_mul(2, d_out).is_none or checked_mul(d_in, U-L).is_none


should d_in be replaced with inf_cast(d_in,T), so that both arguments to checked_mul are of type T?

That is a good point. I changed it, but I am leaving this comment "unresolved" in case other people have different ideas.

SalilVadhan · 2021-09-23T02:04:47Z

bounded_sum_known_n/BoundedSumKnownN.tex

+    \begin{itemize}
+        \item Variable \texttt{n} must be of type \texttt{usize}.
+        \item Type \texttt{T} must have traits \texttt{DistanceConstant(IntDistance)}, \texttt{TotalOrd}, \texttt{CheckedMul}, \texttt{CheckNull}, \texttt{Sum(Output=T)}, \texttt{Sub(Output=T)},  and \texttt{ExactIntCast(usize)}.
+        \item \texttt{IntDistance} must have trait \texttt{InfCast(T)}. (Note that this bullet point is not needed in this proof, but it is needed in the code so a hint can be constructed; otherwise a binary search would be needed to construct the hint.)


Isn't the trait InfCast(T) on IntDistance used in the relation itself to do inf_cast(d_in,T)?

No, "IntDistance has trait InfCast(T)" means that a value can be inf_casted from type T to type IntDistance. inf_cast(d_in,T) casts from the type of d_in to type T.

In the case of inf_cast(d_in,T), because T has trait DistanceConstant(IntDistance), T automatically has trait InfCast(IntDistance). The fact that having trait DistanceConstat(IntDistance) means a type has trait InfCast(IntDistance) should be covered in the definition for trait DistanceConstant in the pseudocode definitions document.

Ah, I see - I got it backwards. But if IntDistance has trait InfCast(T) and T has trait InfCast(IntDistance), doesn't this effectively imply that T is the same as IntDistance? Are there any examples of distinct types where we can exactly cast in both directions? If not, why not simplify things and simply require T=IntDistance?

That's what I thought until last week. However, a type T can actually have trait InfCast(U) if any value v of type U can either be casted to a value w of type T at least as large as v or can return an error saying the inf_cast failed.

So, for example, u32 has trait InfCast(u64) because there's a definition out there for which values v of type u64 can be casted to a value of u32 at least as big as v, and which values v of type u64 can't be casted to a value of u32 and should instead return an error.

InfCast is fallible. If InfCast can cast to an output number GTE the input number, it will, otherwise it (and the entire relation) fails. This fallibility allows InfCast to be implemented for every combination of numeric types. See the u32 row and column from the InfCast block of my type conversion tables.

bounded_sum_known_n/BoundedSumKnownN.tex

SalilVadhan · 2021-09-24T10:24:10Z

bounded_sum_known_n/BoundedSumKnownN.tex

+    \]
+
+    By Lemma 5.6 from the proof definitions document (linked in Section \ref{sec:versioned-docs}), we know that vectors $x, y$ only differ on one element, given that, by assumption, $d_{Sym}(x, y) = 2$.
+


Suggest: remove line break, and change next two sentences to read: "Thus, if we write $x=(x_1,x_2,\ldots,x_n)$ and $y=(y_1,\ldots,y_n)$ as vectors, there is a coordinate $j$ such that we can change $x$ in (only) coordinate $j$ to obtain a vector $z$ such that $\MultiSet(z)=\MultiSet(y)$. This construction of $z$ gives us the following...

SalilVadhan · 2021-09-24T10:25:52Z

bounded_sum_known_n/BoundedSumKnownN.tex

+
+    By Lemma 5.6 from the proof definitions document (linked in Section \ref{sec:versioned-docs}), we know that vectors $x, y$ only differ on one element, given that, by assumption, $d_{Sym}(x, y) = 2$.
+
+    Let $x = (x_1,x_2,\ldots,x_n)$ and $y = (y_1,y_2,\ldots,y_n)$. The fact that we have $d_{Sym}(x, y) = 2$ tells us that there is some vector $z = (z_1,z_2,\ldots,z_n)$ such that, for all $i\neq j$, we have $x_i=z_i$ (and that for $j$, we have $x_j \neq z_j$), and such that $\MultiSet(z)=\MultiSet(y)$. This construction of $z$ gives us the following equalities: $|\sum_i(x_i) - \sum_i(y_i)| = |\sum_i(x_i)-\sum_i(z_i)| = |\sum_i(x_i-z_i)| = |x_j-z_j|$. Because, in a worst-case scenario, the difference between $x_j$ and $z_j$ is the difference between the upper bound $U$ and lower bound $L$, we see that $|x_j-z_j|\leq |U-L| = U-L$ (with the last equality from the fact that $U\geq L$), which is what we wanted to show.


The parenthesis $(x_i)$ look odd, it seems like some operation is being applied to $x_i$. Presumably the goal is to separate the two summations, so I would put parentheses including the sum symbol, like $\left(\sum_i x_i\right)$. I also recommend making this sequence of equations in displayed mode, e.g. with an eqnarray*.

Shoeboxam · 2021-09-24T14:32:57Z

Something to consider going forward:
While the resize relation ensures d_in is at least rounded up to the nearest even number, it is possible to feed d_in directly into transformations with sized input domains. This means it is still possible for d_in to be odd in the bounded sum, by user error. It seems like it would be easy for a user to make this error.

One approach is to reintroduce substitute distances for sized domains, and drop the division by 2 in the bounded sum relation, which also avoids the recent integer truncation issue (U - L) / 2. The resize transformation would be updated to relate SymmetricDistance with SubstituteDistance, and it would round up to the nearest even and then divide by 2 (AKA InfDiv, or (v + 1) /2)

SalilVadhan · 2021-09-24T14:44:26Z

Is there something incorrect in the current implementation or proof? The proof of Lemma 5.7 in the proof definitions document doesn't assume that d_in is even. It is true that applying the stability bound with an odd value of d would give a weaker bound than necessary (c/2)*d_in rather than c*floor(d_in/2), but that's not incorrect.

…

----- Salil Vadhan (he/they)

On Fri, Sep 24, 2021 at 10:33 AM Michael Shoemate ***@***.***> wrote: Something to consider going forward: While the resize relation ensures d_in is at least rounded up to the nearest even number, it is possible to feed d_in directly into transformations with sized input domains. This means it is still possible for d_in to be odd in the bounded sum, by user error. It seems like it would be easy for a user to make this error. One approach is to reintroduce substitute distances for sized domains, and drop the division by 2 in the bounded sum relation, which also avoids the recent integer truncation issue (U - L) / 2. The resize transformation would be updated to relate SymmetricDistance with SubstituteDistance, and it would round up to the nearest even and then divide by 2 (AKA InfDiv, or (v + 1) /2) — You are receiving this because your review was requested. Reply to this email directly, view it on GitHub <#5 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB4KIGJP4HDDNC474GFV33DUDSD2JANCNFSM5AMKPIWA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

Shoeboxam · 2021-09-25T16:25:18Z

@SalilVadhan You're right, if a user provides an odd d_in then the bound is too loose, not too tight. Regardless, I'm having second thoughts on if it might be preferable, when pairing with sized domains, to standardize metrics on substitute distances instead of symmetric distances.

Benefits of pairing SubstituteDistance with SizedDomains:

Users don't need to consider the factor of 2 when working with sized datasets. On the other hand, they need to be more cognizant of the input metric.
Minor: the relation is simpler, because the 2 factor goes away. This allows all of our relations to be expressed as stability constants instead of forward maps.
Minor: the base case and path inductance is 1, n + 1 instead of 2, n + 2.

Downsides of pairing SubstituteDistance with SizedDomains:

It might induce an additional metric generic on data manipulation transformations. Since the resize typically happens at the end of the data processing pipeline, this isn't a current concern. We can just write all data manipulation transformations wrt SymmetricDistance. It isn't clear to me if this approach will become an issue in the future.

It is tempting to just leave it as it is, document the public API better, and add another helper in Rust for building relations out of forward maps (like we have for stability constants). So don't consider my comments blocking.

SalilVadhan · 2021-09-25T21:51:56Z

I'm happy either way - introducing SubstituteDistance or keeping it as is, with just SymmetricDistance - whichever way pro-con analysis lands.

…

----- Salil Vadhan (he/they)

On Sat, Sep 25, 2021 at 12:25 PM Michael Shoemate ***@***.***> wrote: @SalilVadhan <https://github.com/SalilVadhan> You're right, if a user provides an odd d_in then the bound is too loose, not too tight. Regardless, I'm having second thoughts on if it might be preferable, when pairing with sized domains, to standardize metrics on substitute distances instead of symmetric distances. Benefits of using SubstituteDistance for SizedDomains: 1. Users don't need to consider the factor of 2 when working with sized datasets. On the other hand, they need to be more cognizant of the input metric. 2. Minor: the relation is simpler, because the 2 factor goes away. This allows all of our relations to be expressed as stability constants. 3. Minor: the base case and path inductance is 1, n + 1 instead of 2, n + 2. Downsides of using SubstituteDistance for SizedDomains: 1. It might induce an additional metric generic on data manipulation transformations. Since the resize typically happens at the end of the data processing pipeline, this isn't a current concern. It isn't clear to me if it will be in the future. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#5 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AB4KIGPGI42OWXMVOOVAJRDUDXZXTANCNFSM5AMKPIWA> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

Add files via upload

1d579e3

silviacasac requested review from pdurbin, raprasad, ecowan, Shoeboxam, andrewvyrros, SalilVadhan, gracetian6 and cwagaman July 14, 2021 22:17

silviacasac self-assigned this Jul 14, 2021

silviacasac changed the title ~~Add files via upload~~ BoundedSumKnownN Jul 14, 2021

silviacasac changed the title ~~BoundedSumKnownN~~ Bounded sum with known n Jul 14, 2021

gracetian6 reviewed Jul 15, 2021

View reviewed changes

BoundedSumKnownN.tex Outdated Show resolved Hide resolved

BoundedSumKnownN.tex Outdated Show resolved Hide resolved

silviacasac added 8 commits July 21, 2021 11:23

Delete BoundedSumKnownN.pdf

d170615

Delete BoundedSumKnownN.tex

f116e6e

Delete sum_rust.png

320a896

Create thing

0904c14

Add files via upload

ed36e6b

Delete thing

ed80c1e

Add files via upload

eb4c359

typos

0574ced

Shoeboxam reviewed Aug 4, 2021

View reviewed changes

bounded_sum_known_n/BoundedSumKnownN.tex Outdated Show resolved Hide resolved

bounded_sum_known_n/BoundedSumKnownN.tex Outdated Show resolved Hide resolved

bounded_sum_known_n/BoundedSumKnownN.tex Outdated Show resolved Hide resolved

Corrections

c4a4034

silviacasac added 6 commits August 6, 2021 12:55

relation

6cb6721

relation

9c9fe9d

Add files via upload

0f51375

Add files via upload

e2bd5bc

Add files via upload

9a3ee17

Add files via upload

95fe401

Shoeboxam mentioned this pull request Sep 23, 2021

Sized Bounded Sum Transformation -- Proof opendp/opendp#121

Open

SalilVadhan reviewed Sep 23, 2021

View reviewed changes