Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement SkipTo for iter::Iter #287

Closed
wants to merge 16 commits into from
Closed

Conversation

grelner
Copy link
Contributor

@grelner grelner commented Aug 6, 2024

This is the start of the implementation of #286 .
It adds a new trait SkipTo which for now is only implemented for iter::Iter.

The SkipTo trait contains a method skip_to(&mut self, n:u32) which should return an Iterator that yields values >= n.
The implementation for iter::Iter achieves this by doing a binary search for the correct container, and then another binary search if the container is an ArrayStore, or forwarding like a normal BitmapIter until a suitable value is found for BitmapStore.

Right now SkipTo is implemented for iter::Iter, but I don't think this is semantically sound, and it should probably rather be implemented for RoaringBitmap. In the discussion for #286 @Kerollmops wanted a similar api to SkipWhile, but this does not really fit, as SkipWhile can perform it's operation only using an existing iterator, but this is not true for SkipTo, as it needs direct access to the containers to do it's skip efficiently, not only the existing iterator. A reference to the container vec has been added to iter::Iter only for this purpose. The iterator returned is fully independent, and not related to the iterator on which the skip_to call was made at all.

This PR in it's current state implements what I need for my project, but I would like to finish it, albeit at a slightly slower pace.

Comment on lines 309 to 312
fn skip_to(self, n: u32) -> Self::Iter {
SkipToIter::new(&self.containers, n)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: using the full unmodified self.containers means this might skip backwards.

To do this optimally, we might need to somewhat re-implement std::iter::Flatten so we can access the slice::Iter directly (from which we can get the remaining slice), and access to the current container iterator. :/

Copy link
Contributor Author

@grelner grelner Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really? self.containers is just a reference to RoaringBitmap.containers. It was added in this PR
edit: ah, self.containers is RoaringBitmap.containers here. Then there's something I don't understand here if they can be backwards 😅

I thought about changing iter::Iter to only contain a reference to containers, an index, and the iterator for the current container, and do the flatten part in ::next(), which I guess is what you're saying. This would also allow the SkipTo iterator to go backwards.
Still, are all these changes worth it to have a double ended iterator and cover the kinda weird case of skip_to to a point the iterator has already passed?

Copy link
Member

@Kerollmops Kerollmops Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @grelner 👋

Sorry for the late reply, I'm in vacation 🌞

To do this optimally, we might need to somewhat re-implement std::iter::Flatten so we can access the slice::Iter directly (from which we can get the remaining slice), and access to the current container iterator.

I agree with what @Dr-Emann explains here. This is, to me, the best way to implement this iterator. Flatten is not hard to implement. We can have a special Flatten type with all the original Flatten type logic but dedicated to slices and bitmaps so that it can skip groups according to the split high group (skipping whole slices or bitmaps if the number we want to find is not in this group).

I thought about changing iter::Iter to only contain a reference to containers, an index, and the iterator for the current container, and do the flatten part in ::next(), which I guess is what you're saying. This would also allow the SkipTo iterator to go backwards.

Let's forget about the SkipTo wrapper and only implement a non-lazy method like Iterator::advance_by method that directly advances the iterator to the requested number. This method, skip_to, is only available on the bitmap::Iter/IntoIter types, so there is no need to think about std::iter::Rev iterator adaptator.

@grelner grelner marked this pull request as ready for review August 22, 2024 04:59
@grelner
Copy link
Contributor Author

grelner commented Aug 22, 2024

Hi sorry for the slow progress, I've been on vacation too.
I've pushed some changes and I think it's ready for review now.
This implements:

  • advance_to for Iter and IntoIter, which skips containers using the value key. This uses an existing iterator for data
  • Bitmap::iter_from, which is faster that advance_to, because it does a binary search to find the container to start from
    This change significantly changes the iterator implementations, which now implement their own flatten functionality.

@grelner
Copy link
Contributor Author

grelner commented Aug 22, 2024

@Kerollmops sorry this 1.66 requirement keeps tripping me up. I pushed a fix for it now. Maybe that should be added to https://github.com/RoaringBitmap/roaring-rs?tab=readme-ov-file#developing

@grelner
Copy link
Contributor Author

grelner commented Aug 22, 2024

bench-skip_to.txt
cargo bench iteration results of the branch

@grelner
Copy link
Contributor Author

grelner commented Sep 23, 2024

@Kerollmops any chance to get this reviewed and hopefully merged?

Copy link
Member

@Kerollmops Kerollmops left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your work @grelner. That's very good, actually.

I think I am not a huge fan of having a new AdvanceToIter wrapper. Can you tell me why the Iter/IntoIter/advance_to method cannot just advance the iterator with a loop like you did? It will advance container by container until it reaches the right one.

roaring/src/bitmap/iter.rs Outdated Show resolved Hide resolved
roaring/src/bitmap/iter.rs Outdated Show resolved Hide resolved
roaring/tests/iter_advance_to.rs Outdated Show resolved Hide resolved
roaring/src/bitmap/store/mod.rs Outdated Show resolved Hide resolved
roaring/src/bitmap/store/array_store/mod.rs Outdated Show resolved Hide resolved
roaring/tests/iter_advance_to.rs Outdated Show resolved Hide resolved
@grelner grelner requested a review from Kerollmops September 25, 2024 15:10
Comment on lines 387 to 423
fn advance_to_inner(&mut self, n: u32) {
let (key, _) = util::split(n);
fn advance_iter(iter: &mut container::Iter<'_>, n: u32) -> Option<u32> {
while let Some(next) = iter.peek() {
if next < n {
iter.next();
} else {
return Some(next);
}
}
None
}
if let Some(index) = self.find_container(key) {
self.drain_containers_until(index);
let container = self.pop_container_front().expect("bug!");
let mut iter = container.into_iter();
advance_iter(&mut iter, n);
*self.iter_front_mut() = iter;
} else {
// there are no containers with given key. Look in iter_front and iter_back.
let iter_front = self.iter_front_mut();
if iter_front.peek().map(|n| util::split(n).0) == Some(key) {
advance_iter(iter_front, n);
return;
}
if let Some(iter_back) = self.iter_back_mut() {
if iter_back.peek().map(|n| util::split(n).0) == Some(key) {
advance_iter(iter_back, n);
*self.iter_front_mut() = Self::empty_inner_iter();
return;
}
}
self.clear_containers();
*self.iter_front_mut() = Self::empty_inner_iter();
*self.iter_back_mut() = None;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure this isn't updating the size_hint for the iterator.

I really think trying to keep track of the size_hint for these iterators was a mistake, and would rather just remove the exact size_hint tracking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're completely right of course. I've fixed this in the latest commit. But I agree, maybe it would be better to remove this code. It certainly would make it cleaner.

@grelner
Copy link
Contributor Author

grelner commented Oct 10, 2024

@Kerollmops did you change something in CI? It's failing on parts of the code I never touched

@Kerollmops
Copy link
Member

Hey @grelner 👋
It's may be due to #293, would you mind rebasing on main? 🤔

@Kerollmops
Copy link
Member

Hey @grelner 👋

It seems the CI is complaining about missing documentation on some macros, would mind adding the doc in a dedicated commit and rebase on main now that I merged #295, please?

Have a nice day 🎃

Copy link
Member

@Dr-Emann Dr-Emann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to #290, I think I have a good idea of basically exactly what I'd want this implementation to look like, let me know if you'd rather I finish this off. I know there's been a lot of back and forth on this already (add the size hint, now remove the size hint)

@@ -497,26 +510,46 @@ impl PartialEq for Store {
}
}

impl<'a> Iterator for Iter<'a> {
impl Iter<'_> {
pub fn peek(&mut self) -> Option<u16> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can take &self

Comment on lines +388 to +399
fn advance_iter(iter: &mut container::Iter<'_>, n: u32) -> u64 {
let mut items_skipped = 0;
while let Some(next) = iter.peek() {
if next < n {
iter.next();
items_skipped += 1;
} else {
return items_skipped;
}
}
items_skipped
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both array and bitmap container iterators should be able to do this more efficiently, we should be able to implement an store::Iter::advance_to(&mut self, n: u16) function that can do better than O(n):

For an array container, we can binary search for n, and use Iterator::nth to efficiently skip ahead to just before the element. For a bitmap container, we should be able to just directly advance key, and mask the value (with some edge cases where key is unchanged and we need to mask the current value, or key becomes equal to key_back, and we need to steal value_back).

Comment on lines +334 to +344
fn pop_container_front(&mut self) -> Option<Self::Container>;
fn pop_container_back(&mut self) -> Option<Self::Container>;

fn drain_containers_until(&mut self, index: usize);

fn clear_containers(&mut self);

fn find_container(&self, key: u16) -> Option<usize>;

fn iter_front_mut(&mut self) -> &mut container::Iter<'a>;
fn iter_back_mut(&mut self) -> &mut Option<container::Iter<'a>>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to avoid these, if we use slice::Iter<'a, Container> and vec::IntoIter<Container>, rather than &[Container] and VecDeque<Container>. Both implement AsRef<[Container]>, Iterator (for next + nth), and Iterator::Item: IntoIterator<IntoIter=container::Iter>, so it should be possible to write a function something like:

fn advance_to_impl<'a, It>(
    n: u32,
    front_iter: &mut Option<container::Iter<'a>>,
    containers: &mut It,
    back_iter: &mut Option<container::Iter<'a>>,
) where
    It: Iterator,
    It: AsRef<[Container]>,
    It::Item: IntoIterator<IntoIter = container::Iter<'a>>,
{
  let (key, index) = util::split(n);
  // Compare the key of n to the front iter's key {LT => Do nothing, EQ => advance front to index, GT => consume front iter, continue}
  
  // Binary search for the key in containers, if found, front_iter = containers.nth(idx) and advance it, else use _ = nth(idx - 1) to consume up to where it would be.

  // If containers is now empty, check the back iter's key, and possibly advance it or consume it
}

Which has all the logic that could be called by both advance_to implementations.

Array(slice::Iter<'a, u16>),
Vec(vec::IntoIter<u16>),
BitmapBorrowed(BitmapIter<&'a [u64; BITMAP_LENGTH]>),
BitmapOwned(BitmapIter<Box<[u64; BITMAP_LENGTH]>>),
}

pub struct Iter<'a> {
inner: IterInner<'a>,
size_hint: u64,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry you went to all the trouble of adding tracking the size hint, but with #295, this can all go back away now 😢

@grelner
Copy link
Contributor Author

grelner commented Nov 1, 2024

Similar to #290, I think I have a good idea of basically exactly what I'd want this implementation to look like, let me know if you'd rather I finish this off. I know there's been a lot of back and forth on this already (add the size hint, now remove the size hint)

Hi @Dr-Emann , thanks for the feedback! I'm hilariously busy at the moment and unfortunately won't have time to work on this for at least 2-3 weeks so feel free to finish up. If not I'm happy to do it when I have some time again

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants