-
Notifications
You must be signed in to change notification settings - Fork 1
Bounded sum with known n #5
base: main
Are you sure you want to change the base?
Conversation
Remaining things to check:
|
raise Exception('Potential overflow') |\label{line:max}| | ||
|
||
def relation(d_in: u32, d_out: T) -> bool: |\label{line:rel}| | ||
if checked_mul(2, d_out).is_none or checked_mul(d_in, U-L).is_none |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should d_in be replaced with inf_cast(d_in,T), so that both arguments to checked_mul are of type T?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good point. I changed it, but I am leaving this comment "unresolved" in case other people have different ideas.
\begin{itemize} | ||
\item Variable \texttt{n} must be of type \texttt{usize}. | ||
\item Type \texttt{T} must have traits \texttt{DistanceConstant(IntDistance)}, \texttt{TotalOrd}, \texttt{CheckedMul}, \texttt{CheckNull}, \texttt{Sum(Output=T)}, \texttt{Sub(Output=T)}, and \texttt{ExactIntCast(usize)}. | ||
\item \texttt{IntDistance} must have trait \texttt{InfCast(T)}. (Note that this bullet point is not needed in this proof, but it is needed in the code so a hint can be constructed; otherwise a binary search would be needed to construct the hint.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the trait InfCast(T) on IntDistance used in the relation itself to do inf_cast(d_in,T)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, "IntDistance
has trait InfCast(T)
" means that a value can be inf_cast
ed from type T
to type IntDistance
. inf_cast(d_in,T)
casts from the type of d_in
to type T
.
In the case of inf_cast(d_in,T)
, because T
has trait DistanceConstant(IntDistance)
, T
automatically has trait InfCast(IntDistance)
. The fact that having trait DistanceConstat(IntDistance)
means a type has trait InfCast(IntDistance)
should be covered in the definition for trait DistanceConstant
in the pseudocode definitions document.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see - I got it backwards. But if IntDistance has trait InfCast(T) and T has trait InfCast(IntDistance), doesn't this effectively imply that T is the same as IntDistance? Are there any examples of distinct types where we can exactly cast in both directions? If not, why not simplify things and simply require T=IntDistance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I thought until last week. However, a type T
can actually have trait InfCast(U)
if any value v
of type U
can either be casted to a value w
of type T
at least as large as v
or can return an error saying the inf_cast
failed.
So, for example, u32
has trait InfCast(u64)
because there's a definition out there for which values v
of type u64
can be casted to a value of u32
at least as big as v
, and which values v
of type u64
can't be casted to a value of u32
and should instead return an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InfCast is fallible. If InfCast can cast to an output number GTE the input number, it will, otherwise it (and the entire relation) fails. This fallibility allows InfCast to be implemented for every combination of numeric types. See the u32 row and column from the InfCast block of my type conversion tables.
\] | ||
|
||
By Lemma 5.6 from the proof definitions document (linked in Section \ref{sec:versioned-docs}), we know that vectors $x, y$ only differ on one element, given that, by assumption, $d_{Sym}(x, y) = 2$. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest: remove line break, and change next two sentences to read: "Thus, if we write
|
||
By Lemma 5.6 from the proof definitions document (linked in Section \ref{sec:versioned-docs}), we know that vectors $x, y$ only differ on one element, given that, by assumption, $d_{Sym}(x, y) = 2$. | ||
|
||
Let $x = (x_1,x_2,\ldots,x_n)$ and $y = (y_1,y_2,\ldots,y_n)$. The fact that we have $d_{Sym}(x, y) = 2$ tells us that there is some vector $z = (z_1,z_2,\ldots,z_n)$ such that, for all $i\neq j$, we have $x_i=z_i$ (and that for $j$, we have $x_j \neq z_j$), and such that $\MultiSet(z)=\MultiSet(y)$. This construction of $z$ gives us the following equalities: $|\sum_i(x_i) - \sum_i(y_i)| = |\sum_i(x_i)-\sum_i(z_i)| = |\sum_i(x_i-z_i)| = |x_j-z_j|$. Because, in a worst-case scenario, the difference between $x_j$ and $z_j$ is the difference between the upper bound $U$ and lower bound $L$, we see that $|x_j-z_j|\leq |U-L| = U-L$ (with the last equality from the fact that $U\geq L$), which is what we wanted to show. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parenthesis
Something to consider going forward: One approach is to reintroduce substitute distances for sized domains, and drop the division by 2 in the bounded sum relation, which also avoids the recent integer truncation issue |
Is there something incorrect in the current implementation or proof? The
proof of Lemma 5.7 in the proof definitions document doesn't assume that
d_in is even. It is true that applying the stability bound with an odd
value of d would give a weaker bound than necessary (c/2)*d_in rather than
c*floor(d_in/2), but that's not incorrect.
…-----
Salil Vadhan (he/they)
On Fri, Sep 24, 2021 at 10:33 AM Michael Shoemate ***@***.***> wrote:
Something to consider going forward:
While the resize relation ensures d_in is at least rounded up to the
nearest even number, it is possible to feed d_in directly into
transformations with sized input domains. This means it is still possible
for d_in to be odd in the bounded sum, by user error. It seems like it
would be easy for a user to make this error.
One approach is to reintroduce substitute distances for sized domains, and
drop the division by 2 in the bounded sum relation, which also avoids the
recent integer truncation issue (U - L) / 2. The resize transformation
would be updated to relate SymmetricDistance with SubstituteDistance, and
it would round up to the nearest even and then divide by 2 (AKA InfDiv, or
(v + 1) /2)
—
You are receiving this because your review was requested.
Reply to this email directly, view it on GitHub
<#5 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB4KIGJP4HDDNC474GFV33DUDSD2JANCNFSM5AMKPIWA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
@SalilVadhan You're right, if a user provides an odd d_in then the bound is too loose, not too tight. Regardless, I'm having second thoughts on if it might be preferable, when pairing with sized domains, to standardize metrics on substitute distances instead of symmetric distances. Benefits of pairing SubstituteDistance with SizedDomains:
Downsides of pairing SubstituteDistance with SizedDomains:
It is tempting to just leave it as it is, document the public API better, and add another helper in Rust for building relations out of forward maps (like we have for stability constants). So don't consider my comments blocking. |
I'm happy either way - introducing SubstituteDistance or keeping it as is,
with just SymmetricDistance - whichever way pro-con analysis lands.
…-----
Salil Vadhan (he/they)
On Sat, Sep 25, 2021 at 12:25 PM Michael Shoemate ***@***.***> wrote:
@SalilVadhan <https://github.com/SalilVadhan> You're right, if a user
provides an odd d_in then the bound is too loose, not too tight.
Regardless, I'm having second thoughts on if it might be preferable, when
pairing with sized domains, to standardize metrics on substitute distances
instead of symmetric distances.
Benefits of using SubstituteDistance for SizedDomains:
1. Users don't need to consider the factor of 2 when working with
sized datasets. On the other hand, they need to be more cognizant of the
input metric.
2. Minor: the relation is simpler, because the 2 factor goes away.
This allows all of our relations to be expressed as stability constants.
3. Minor: the base case and path inductance is 1, n + 1 instead of 2,
n + 2.
Downsides of using SubstituteDistance for SizedDomains:
1. It might induce an additional metric generic on data manipulation
transformations. Since the resize typically happens at the end of the data
processing pipeline, this isn't a current concern. It isn't clear to me if
it will be in the future.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB4KIGPGI42OWXMVOOVAJRDUDXZXTANCNFSM5AMKPIWA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
|
September 24, 2021, upload
Several cycles of revisions are now complete. Adjustments in response to feedback from Salil and Marco have now been made. As far as I can tell, there is no outstanding feedback to incorporate.
Major note: this proof is only for integer types. Also, the Rust function is not fully in line with the function we specify in the pseudocode. For example, I believe the Rust version has issues with integer truncation in the stability relation, and with overflows. As far as we can tell, the pseudocode version does not.
September 21, 2021, upload
New points of discussion when revising:
2*d_out >= d_in*(U-L)
instead ofd_out >= d_in*(U-L)/2
in order to avoid division errors?(a) Imposing that we only accept integers.
(b) Changing the stability relation to
2*d_out >= d_in*(U-L)
.(c) Checking overflow for
2*d_out
andd_in
.(d) Naming mismatch between
n
in the pseudocode+proor vssize
in the code.exact_int_cast
check.L<U
orL<=U
?(Old -- from July):
First pull request for the Bounded sum with known n transformation, which corresponds to this GitHub permalink.
Files: