-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: replace Range with a bounded implementation #112
Conversation
benches/large_case.rs
Outdated
|
||
fn bench<'a, P: Package + Deserialize<'a>, V: Version + Hash + Deserialize<'a>>( | ||
fn bench<'a, P: Package + Deserialize<'a>, V: VersionSet + Deserialize<'a>>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call it VS:
not V:
, I think that will be more consistent with our usage in the rest of the package?
Then the where
can become where VS::V: Deserialize<'a>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good at this point. I have to admit my eyes glazed over while looking at intersection
, but I have confidence in our tests.
src/range.rs
Outdated
.prop_map(|((start_bounded, end_bounded), mut vec)| { | ||
// Ensure the bounds are increasing and non-repeating | ||
vec.sort_by_key(|(value, _)| *value); | ||
vec.dedup_by_key(|(value, _)| *value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this more I'm now realizing that this cannot generate [(Unbounded, Excluded(1)), (Excluded(1), Unbounded)]
. But I'm not seeing an obvious way to fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is an alternative strategy ad38972 thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of using deltas instead of sorting values! I have not read in all details the generation, it needs a bit more comments and explaining I think. I also thought there might be a logic swap between "bounded" and "unbounded" in names but it may be that I just didn't pay enough attention (in all cases, it may need more comments).
Here is what I have in mind after reading your proposal @Eh2406 :
- generate random start that is one of included 0 ( = unbounded ) | excluded 0 | non 0 (included or excluded)
- generate random vec of deltas
- dedup successive 0 deltas, there can only be one 0 surrounded by non zeros
- for each delta, alternate between start and end bounds
- generate random
Excluded | Included
tags for each bound (with valid constraints for 0 deltas) - if the last bound is a start bound (depends on start and length of generated vector), add an unbounded at the end
If that's already what you did great! (you can add comments then) If not, can you please let us know how it differs and maybe show how your generator is better or how we can make a mix of this rough outline and the one you proposed to have something that have the best properties?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried rewriting the strategy based on your proposal (based on @Eh2406 code) and found that it's basically almost what you describe @mpizenberg . I've committed a version that includes comments (c44639a).
} | ||
|
||
/// Compute the intersection of two sets of versions. | ||
/// Computes the intersection of two sets of versions. | ||
pub fn intersection(&self, other: &Self) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spent time today trying to write a "more obviously correct" intersection
, with the same perf. I did not succeed. I did find this method helpful for catching corner cases:
fn check_invariants(&self) {
if cfg!(debug_assertions) {
for (i, (s, e)) in self.segments.iter().enumerate() {
if matches!(s, Unbounded) && i != 0 {
panic!()
}
if matches!(e, Unbounded) && i != (self.segments.len() - 1) {
panic!()
}
}
for p in self.segments.as_slice().windows(2) {
match (&p[0].1, &p[1].0) {
(Included(l_end), Included(r_start)) => assert!(l_end < r_start),
(Included(l_end), Excluded(r_start)) => assert!(l_end < r_start),
(Excluded(l_end), Included(r_start)) => assert!(l_end < r_start),
(Excluded(l_end), Excluded(r_start)) => assert!(l_end <= r_start),
(_, Unbounded) => panic!(),
(Unbounded, _) => panic!(),
}
}
for (s, e) in self.segments.iter() {
assert!(match (s, e) {
(Included(s), Included(e)) => s <= e,
(Included(s), Excluded(e)) => s < e,
(Excluded(s), Included(e)) => s < e,
(Excluded(s), Excluded(e)) => s < e,
(Unbounded, _) | (_, Unbounded) => true,
});
}
}
}
Perhaps we can add a call to it to the end of all methods that construct a range
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is as close as I came:
6ac1e06?diff=split
perf is slightly worse, but I find it more readable. what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do like that its a lot shorter and easier to follow, my code had a lot of cases. How much worse is the performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rerunning the benchmarks now I am not seeing significant differences between our two implementations! I guess I shouldn't try benchmarking at 2 o'clock in the morning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha, Ill copy your implementation into the MR. Should I also add the invariant check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or can I simply merge your changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merge/copy/rewrite, As you wish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the check invariants function! Is the first for
check not included in the second already btw?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cherry-picked your code (crediting you) and added the suggestion about next_if
by @mpizenberg
536f041
to
a1ea900
Compare
I am comfortable merging this. I like my impl of |
@Eh2406 I very much prefer the simplicity of your intersection function! it's clear, and shorter so much easier to maintain and to serve as an example for others. So if there is no or a slight performance cost, I'm in favor of this. Regarding proptest generators, I'd also be more confident if we can generate all kinds of possible version sets so I'd like to have something like what is proposing @Eh2406 . I added some comments of my own on that proposal. @baszalmstra and @Eh2406 I'd love some feedback on those. Side note. I've been a bit busy and also have an event this weekend. But I've arranged to start early July with my next job, so I should have the second part of June to make a push toward my docs goals. |
Co-authored-by: Jacob Finkelman <[email protected]>
Co-authored-by: Jacob Finkelman <[email protected]>
I added the intersection code and proptest strategy (with comments) from @Eh2406 . Let me know what you guys think! |
(Included(s), Excluded(e)) => s < e, | ||
(Excluded(s), Included(e)) => s < e, | ||
(Excluded(s), Excluded(e)) => s < e, | ||
(Unbounded, _) | (_, Unbounded) => true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, there might be representations special cases where we have an invalid segment that cannot be checked by this function like (Unbounded, Excluded(0))
in the case where versions are u32
. It depends if there are other ways to represent the left-most and right-most bounds with the version type.
What do you guys think of that? negligible?
Another remark for something I'm just realizing now. Bounds for discrete versions sets may introduce comparison errors? Can we end up with the two following sets of u32 that are the same but not structurally equal? 1: (3, 7)
2: (3, 4) (5,7) Actually now that I rethink about the simple intersection code, I'm not even sure it preserves the "uniqueness of representation" property required for sets comparisons. I didn't see special handling of same boundaries. Like what happens if we ask the union of If we translate this union into intersection job, it means we will compute intersection of But is it possible to arrive at a situation where we have already pushed Let |
Ok sorry for the long text of thinking out loud. I think I'm convinced it works as intended for continuous spaces of versions (would love a confirmation from your thinking). But there is still the problem of non unique representations in the case of discrete spaces where things like |
I would love to be able to apply formal methods to make sure our implementations are correct. Even so based on my current understanding, if the inputs are structurally valid than all of the operations are correct (and structurally valid).
I don't think this is critical to correctness. The algorithm may end up having to ask the dependency provider about if there are versions in |
I was more thinking about the part that computes sets relations as said in the guide:
So wondering if we could end with a situation where we have two terms t1 and t2 as follows. a b c next(c) d e
t1 = [-----------------] [-----------------]
t2 = [-------------------] And the intersection is computed as |
But yeah you're probably right. Maybe this is still correct in a sense, and will just push the algorithm into a state where it needs more work to be done, and not to a wrong state. And then the only inconvenience is performance. |
I'd still love if we could add a warning about potential non-unique representations of version sets in the code. And potential implications it may have, even if it turns out that in practice, with only valid input we never end up compromising the solver properties. At least have it documented in code comments somewhere in the bounded implementation. When that is done, and if you guys are confident we can go forward then let's go with this :) |
Ill add a comment in the range module documentation! |
I added a comment, does it help explain this potential issue? |
@mpizenberg do you think this is good for merge? |
Sorry for the long time before answering @baszalmstra. I've been moving around a lot. So I'd like to be extra clear that unique representations is an assumption made by the solver and not following that constraint is a possible source of bugs. Until now, this was only a comment in the guide, but since we are making bounded segments available in the API and it clearly enables different representations this needs to be clearly mentioned in the code. What do you guys think of a comment like the following one.
|
If you guys are ok with my comment above to add it in the code, or something similar, that's my last nitpick I think. After that it's ready to merge in my opinion, so no need to wait for me if I'm not responsive in the coming days. |
As usual your writing is articulate and clear! I have no objection to adding that anywhere you would like. I would prefer a softer version of the last paragraph. (But not enough to stop getting things actually merged.) How about:
|
Yep, your variation of the text is good too. |
@Eh2406 Im on holiday the next week, are you able to make the above changes? “Allow edits by maintainers” is enabled. |
I will try. Enjoy your holiday! |
I just thought of something. I remember that in some places in the code a comparison is made to an empty set (like here Line 146 in 717289b
|
Please let me the afternoon check something before merging. There is something I want to check. |
I did not get time over this work week for any open source work. Sorry. I have time today, if I can still be helpful.
It is https://github.com/pubgrub-rs/pubgrub/blob/dev/src/term.rs#L154 in the non-test code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. Document as you wish and Merge when you are happy.
Nevermind, I realized what I wanted to check for the result of intersections was already in the I also made two more restrictive change in the random generator of ranges. (1) if delta is 0 only double inclusive segments are valid. There cannot be an inclusive and an exclusive bound because that's an empty segment and these are not valid, as per the Considering these two situations were supposed to be possible previously in the random generator, I'm surprised we didn't end with failing tests due to the call to |
I also renamed the variable |
* refactor: replace Range with a bounded implementation * fix: rewrite range proptest strategy * fix: deserialize SmallVec without Vec alloc * fix: remove not_equals * fix: re-add union and remove early out * fix: renamed V to VS in bench * refactor: simpler intersection Co-authored-by: Jacob Finkelman <[email protected]> * test: use deltas for range strategy Co-authored-by: Jacob Finkelman <[email protected]> * docs(range): added comment about discrete values * More restrictive for valid random range generation * Remove duplicate check in check_invariants * Add warning about non-unique ranges representations * Rename start_bounded into start_unbounded Co-authored-by: Jacob Finkelman <[email protected]> Co-authored-by: Matthieu Pizenberg <[email protected]>
This PR replaces
Range<T>
with an implementation that allows inclusive or exclusive bounds to be used. This enablesT
to be any type that implementsOrd + Clone
and doesn't require theVersion
trait.It also renames some of the functions from
Range
to be more aligned with the names used in theVersionSet
trait.This is a cleaned-up version of #111 .