Skip to content
This repository has been archived by the owner on Jul 30, 2024. It is now read-only.

Bounded sum with known n #5

Open
wants to merge 22 commits into
base: main
Choose a base branch
from
Open

Bounded sum with known n #5

wants to merge 22 commits into from

Conversation

silviacasac
Copy link

@silviacasac silviacasac commented Jul 14, 2021

September 24, 2021, upload

Several cycles of revisions are now complete. Adjustments in response to feedback from Salil and Marco have now been made. As far as I can tell, there is no outstanding feedback to incorporate.

Major note: this proof is only for integer types. Also, the Rust function is not fully in line with the function we specify in the pseudocode. For example, I believe the Rust version has issues with integer truncation in the stability relation, and with overflows. As far as we can tell, the pseudocode version does not.

September 21, 2021, upload

New points of discussion when revising:

  • What is the best way to specify/impose that the stability relation that we have (and hence its corresponding proof) only works for integers?
  • Changing the stability relation to 2*d_out >= d_in*(U-L) instead of d_out >= d_in*(U-L)/2 in order to avoid division errors?
  • Mismatches between the proof and the pseudocode:
    (a) Imposing that we only accept integers.
    (b) Changing the stability relation to2*d_out >= d_in*(U-L).
    (c) Checking overflow for 2*d_out and d_in.
    (d) Naming mismatch between n in the pseudocode+proor vs size in the code.
  • Adding some more explanation for the exact_int_cast check.
  • Is the precondition L<U or L<=U?

(Old -- from July):

First pull request for the Bounded sum with known n transformation, which corresponds to this GitHub permalink.

Files:

  • BoundedSumKnownN.tex -- the LaTeX document with the proof.
  • BoundedSumKnownN.pdf -- the PDF generated from the TeX document.

@silviacasac silviacasac self-assigned this Jul 14, 2021
@silviacasac silviacasac changed the title Add files via upload BoundedSumKnownN Jul 14, 2021
@silviacasac silviacasac changed the title BoundedSumKnownN Bounded sum with known n Jul 14, 2021
BoundedSumKnownN.tex Outdated Show resolved Hide resolved
BoundedSumKnownN.tex Outdated Show resolved Hide resolved
bounded_sum_known_n/BoundedSumKnownN.tex Outdated Show resolved Hide resolved
bounded_sum_known_n/BoundedSumKnownN.tex Outdated Show resolved Hide resolved
bounded_sum_known_n/BoundedSumKnownN.tex Outdated Show resolved Hide resolved
@silviacasac
Copy link
Author

Remaining things to check:

  • Whether it is OK that the castings to not appear in the proof.
  • Important. Floating point issues both in the addition of the elements and in the division. We are working on it.

raise Exception('Potential overflow') |\label{line:max}|

def relation(d_in: u32, d_out: T) -> bool: |\label{line:rel}|
if checked_mul(2, d_out).is_none or checked_mul(d_in, U-L).is_none
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should d_in be replaced with inf_cast(d_in,T), so that both arguments to checked_mul are of type T?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good point. I changed it, but I am leaving this comment "unresolved" in case other people have different ideas.

\begin{itemize}
\item Variable \texttt{n} must be of type \texttt{usize}.
\item Type \texttt{T} must have traits \texttt{DistanceConstant(IntDistance)}, \texttt{TotalOrd}, \texttt{CheckedMul}, \texttt{CheckNull}, \texttt{Sum(Output=T)}, \texttt{Sub(Output=T)}, and \texttt{ExactIntCast(usize)}.
\item \texttt{IntDistance} must have trait \texttt{InfCast(T)}. (Note that this bullet point is not needed in this proof, but it is needed in the code so a hint can be constructed; otherwise a binary search would be needed to construct the hint.)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the trait InfCast(T) on IntDistance used in the relation itself to do inf_cast(d_in,T)?

Copy link

@cwagaman cwagaman Sep 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, "IntDistance has trait InfCast(T)" means that a value can be inf_casted from type T to type IntDistance. inf_cast(d_in,T) casts from the type of d_in to type T.

In the case of inf_cast(d_in,T), because T has trait DistanceConstant(IntDistance), T automatically has trait InfCast(IntDistance). The fact that having trait DistanceConstat(IntDistance) means a type has trait InfCast(IntDistance) should be covered in the definition for trait DistanceConstant in the pseudocode definitions document.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see - I got it backwards. But if IntDistance has trait InfCast(T) and T has trait InfCast(IntDistance), doesn't this effectively imply that T is the same as IntDistance? Are there any examples of distinct types where we can exactly cast in both directions? If not, why not simplify things and simply require T=IntDistance?

Copy link

@cwagaman cwagaman Sep 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I thought until last week. However, a type T can actually have trait InfCast(U) if any value v of type U can either be casted to a value w of type T at least as large as v or can return an error saying the inf_cast failed.

So, for example, u32 has trait InfCast(u64) because there's a definition out there for which values v of type u64 can be casted to a value of u32 at least as big as v, and which values v of type u64 can't be casted to a value of u32 and should instead return an error.

Copy link
Member

@Shoeboxam Shoeboxam Sep 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InfCast is fallible. If InfCast can cast to an output number GTE the input number, it will, otherwise it (and the entire relation) fails. This fallibility allows InfCast to be implemented for every combination of numeric types. See the u32 row and column from the InfCast block of my type conversion tables.

\]

By Lemma 5.6 from the proof definitions document (linked in Section \ref{sec:versioned-docs}), we know that vectors $x, y$ only differ on one element, given that, by assumption, $d_{Sym}(x, y) = 2$.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest: remove line break, and change next two sentences to read: "Thus, if we write $x=(x_1,x_2,\ldots,x_n)$ and $y=(y_1,\ldots,y_n)$ as vectors, there is a coordinate $j$ such that we can change $x$ in (only) coordinate $j$ to obtain a vector $z$ such that $\MultiSet(z)=\MultiSet(y)$. This construction of $z$ gives us the following...


By Lemma 5.6 from the proof definitions document (linked in Section \ref{sec:versioned-docs}), we know that vectors $x, y$ only differ on one element, given that, by assumption, $d_{Sym}(x, y) = 2$.

Let $x = (x_1,x_2,\ldots,x_n)$ and $y = (y_1,y_2,\ldots,y_n)$. The fact that we have $d_{Sym}(x, y) = 2$ tells us that there is some vector $z = (z_1,z_2,\ldots,z_n)$ such that, for all $i\neq j$, we have $x_i=z_i$ (and that for $j$, we have $x_j \neq z_j$), and such that $\MultiSet(z)=\MultiSet(y)$. This construction of $z$ gives us the following equalities: $|\sum_i(x_i) - \sum_i(y_i)| = |\sum_i(x_i)-\sum_i(z_i)| = |\sum_i(x_i-z_i)| = |x_j-z_j|$. Because, in a worst-case scenario, the difference between $x_j$ and $z_j$ is the difference between the upper bound $U$ and lower bound $L$, we see that $|x_j-z_j|\leq |U-L| = U-L$ (with the last equality from the fact that $U\geq L$), which is what we wanted to show.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parenthesis $(x_i)$ look odd, it seems like some operation is being applied to $x_i$. Presumably the goal is to separate the two summations, so I would put parentheses including the sum symbol, like $\left(\sum_i x_i\right)$. I also recommend making this sequence of equations in displayed mode, e.g. with an eqnarray*.

@Shoeboxam
Copy link
Member

Something to consider going forward:
While the resize relation ensures d_in is at least rounded up to the nearest even number, it is possible to feed d_in directly into transformations with sized input domains. This means it is still possible for d_in to be odd in the bounded sum, by user error. It seems like it would be easy for a user to make this error.

One approach is to reintroduce substitute distances for sized domains, and drop the division by 2 in the bounded sum relation, which also avoids the recent integer truncation issue (U - L) / 2. The resize transformation would be updated to relate SymmetricDistance with SubstituteDistance, and it would round up to the nearest even and then divide by 2 (AKA InfDiv, or (v + 1) /2)

@SalilVadhan
Copy link
Collaborator

SalilVadhan commented Sep 24, 2021 via email

@Shoeboxam
Copy link
Member

Shoeboxam commented Sep 25, 2021

@SalilVadhan You're right, if a user provides an odd d_in then the bound is too loose, not too tight. Regardless, I'm having second thoughts on if it might be preferable, when pairing with sized domains, to standardize metrics on substitute distances instead of symmetric distances.

Benefits of pairing SubstituteDistance with SizedDomains:

  1. Users don't need to consider the factor of 2 when working with sized datasets. On the other hand, they need to be more cognizant of the input metric.
  2. Minor: the relation is simpler, because the 2 factor goes away. This allows all of our relations to be expressed as stability constants instead of forward maps.
  3. Minor: the base case and path inductance is 1, n + 1 instead of 2, n + 2.

Downsides of pairing SubstituteDistance with SizedDomains:

  1. It might induce an additional metric generic on data manipulation transformations. Since the resize typically happens at the end of the data processing pipeline, this isn't a current concern. We can just write all data manipulation transformations wrt SymmetricDistance. It isn't clear to me if this approach will become an issue in the future.

It is tempting to just leave it as it is, document the public API better, and add another helper in Rust for building relations out of forward maps (like we have for stability constants). So don't consider my comments blocking.

@SalilVadhan
Copy link
Collaborator

SalilVadhan commented Sep 25, 2021 via email

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants