-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement SkipTo for iter::Iter #287
Conversation
src/bitmap/iter.rs
Outdated
fn skip_to(self, n: u32) -> Self::Iter { | ||
SkipToIter::new(&self.containers, n) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: using the full unmodified self.containers
means this might skip backwards.
To do this optimally, we might need to somewhat re-implement std::iter::Flatten
so we can access the slice::Iter
directly (from which we can get the remaining slice), and access to the current container iterator. :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really? self.containers is just a reference to RoaringBitmap.containers. It was added in this PR
edit: ah, self.containers is RoaringBitmap.containers here. Then there's something I don't understand here if they can be backwards 😅
I thought about changing iter::Iter to only contain a reference to containers, an index, and the iterator for the current container, and do the flatten part in ::next(), which I guess is what you're saying. This would also allow the SkipTo iterator to go backwards.
Still, are all these changes worth it to have a double ended iterator and cover the kinda weird case of skip_to to a point the iterator has already passed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @grelner 👋
Sorry for the late reply, I'm in vacation 🌞
To do this optimally, we might need to somewhat re-implement
std::iter::Flatten
so we can access theslice::Iter
directly (from which we can get the remaining slice), and access to the current container iterator.
I agree with what @Dr-Emann explains here. This is, to me, the best way to implement this iterator. Flatten
is not hard to implement. We can have a special Flatten
type with all the original Flatten
type logic but dedicated to slices and bitmaps so that it can skip groups according to the split
high group (skipping whole slices or bitmaps if the number we want to find is not in this group).
I thought about changing iter::Iter to only contain a reference to containers, an index, and the iterator for the current container, and do the flatten part in ::next(), which I guess is what you're saying. This would also allow the SkipTo iterator to go backwards.
Let's forget about the SkipTo
wrapper and only implement a non-lazy method like Iterator::advance_by
method that directly advances the iterator to the requested number. This method, skip_to
, is only available on the bitmap::Iter/IntoIter
types, so there is no need to think about std::iter::Rev
iterator adaptator.
Hi sorry for the slow progress, I've been on vacation too.
|
@Kerollmops sorry this 1.66 requirement keeps tripping me up. I pushed a fix for it now. Maybe that should be added to https://github.com/RoaringBitmap/roaring-rs?tab=readme-ov-file#developing |
bench-skip_to.txt |
@Kerollmops any chance to get this reviewed and hopefully merged? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your work @grelner. That's very good, actually.
I think I am not a huge fan of having a new AdvanceToIter
wrapper. Can you tell me why the Iter/IntoIter/advance_to
method cannot just advance the iterator with a loop like you did? It will advance container by container until it reaches the right one.
roaring/src/bitmap/iter.rs
Outdated
fn advance_to_inner(&mut self, n: u32) { | ||
let (key, _) = util::split(n); | ||
fn advance_iter(iter: &mut container::Iter<'_>, n: u32) -> Option<u32> { | ||
while let Some(next) = iter.peek() { | ||
if next < n { | ||
iter.next(); | ||
} else { | ||
return Some(next); | ||
} | ||
} | ||
None | ||
} | ||
if let Some(index) = self.find_container(key) { | ||
self.drain_containers_until(index); | ||
let container = self.pop_container_front().expect("bug!"); | ||
let mut iter = container.into_iter(); | ||
advance_iter(&mut iter, n); | ||
*self.iter_front_mut() = iter; | ||
} else { | ||
// there are no containers with given key. Look in iter_front and iter_back. | ||
let iter_front = self.iter_front_mut(); | ||
if iter_front.peek().map(|n| util::split(n).0) == Some(key) { | ||
advance_iter(iter_front, n); | ||
return; | ||
} | ||
if let Some(iter_back) = self.iter_back_mut() { | ||
if iter_back.peek().map(|n| util::split(n).0) == Some(key) { | ||
advance_iter(iter_back, n); | ||
*self.iter_front_mut() = Self::empty_inner_iter(); | ||
return; | ||
} | ||
} | ||
self.clear_containers(); | ||
*self.iter_front_mut() = Self::empty_inner_iter(); | ||
*self.iter_back_mut() = None; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure this isn't updating the size_hint
for the iterator.
I really think trying to keep track of the size_hint
for these iterators was a mistake, and would rather just remove the exact size_hint tracking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're completely right of course. I've fixed this in the latest commit. But I agree, maybe it would be better to remove this code. It certainly would make it cleaner.
@Kerollmops did you change something in CI? It's failing on parts of the code I never touched |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to #290, I think I have a good idea of basically exactly what I'd want this implementation to look like, let me know if you'd rather I finish this off. I know there's been a lot of back and forth on this already (add the size hint, now remove the size hint)
@@ -497,26 +510,46 @@ impl PartialEq for Store { | |||
} | |||
} | |||
|
|||
impl<'a> Iterator for Iter<'a> { | |||
impl Iter<'_> { | |||
pub fn peek(&mut self) -> Option<u16> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can take &self
fn advance_iter(iter: &mut container::Iter<'_>, n: u32) -> u64 { | ||
let mut items_skipped = 0; | ||
while let Some(next) = iter.peek() { | ||
if next < n { | ||
iter.next(); | ||
items_skipped += 1; | ||
} else { | ||
return items_skipped; | ||
} | ||
} | ||
items_skipped | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both array and bitmap container iterators should be able to do this more efficiently, we should be able to implement an store::Iter::advance_to(&mut self, n: u16)
function that can do better than O(n)
:
For an array container, we can binary search for n
, and use Iterator::nth
to efficiently skip ahead to just before the element. For a bitmap container, we should be able to just directly advance key
, and mask the value (with some edge cases where key
is unchanged and we need to mask the current value, or key
becomes equal to key_back
, and we need to steal value_back).
fn pop_container_front(&mut self) -> Option<Self::Container>; | ||
fn pop_container_back(&mut self) -> Option<Self::Container>; | ||
|
||
fn drain_containers_until(&mut self, index: usize); | ||
|
||
fn clear_containers(&mut self); | ||
|
||
fn find_container(&self, key: u16) -> Option<usize>; | ||
|
||
fn iter_front_mut(&mut self) -> &mut container::Iter<'a>; | ||
fn iter_back_mut(&mut self) -> &mut Option<container::Iter<'a>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to avoid these, if we use slice::Iter<'a, Container>
and vec::IntoIter<Container>
, rather than &[Container]
and VecDeque<Container>
. Both implement AsRef<[Container]>
, Iterator
(for next
+ nth
), and Iterator::Item: IntoIterator<IntoIter=container::Iter>
, so it should be possible to write a function something like:
fn advance_to_impl<'a, It>(
n: u32,
front_iter: &mut Option<container::Iter<'a>>,
containers: &mut It,
back_iter: &mut Option<container::Iter<'a>>,
) where
It: Iterator,
It: AsRef<[Container]>,
It::Item: IntoIterator<IntoIter = container::Iter<'a>>,
{
let (key, index) = util::split(n);
// Compare the key of n to the front iter's key {LT => Do nothing, EQ => advance front to index, GT => consume front iter, continue}
// Binary search for the key in containers, if found, front_iter = containers.nth(idx) and advance it, else use _ = nth(idx - 1) to consume up to where it would be.
// If containers is now empty, check the back iter's key, and possibly advance it or consume it
}
Which has all the logic that could be called by both advance_to implementations.
Array(slice::Iter<'a, u16>), | ||
Vec(vec::IntoIter<u16>), | ||
BitmapBorrowed(BitmapIter<&'a [u64; BITMAP_LENGTH]>), | ||
BitmapOwned(BitmapIter<Box<[u64; BITMAP_LENGTH]>>), | ||
} | ||
|
||
pub struct Iter<'a> { | ||
inner: IterInner<'a>, | ||
size_hint: u64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry you went to all the trouble of adding tracking the size hint, but with #295, this can all go back away now 😢
Hi @Dr-Emann , thanks for the feedback! I'm hilariously busy at the moment and unfortunately won't have time to work on this for at least 2-3 weeks so feel free to finish up. If not I'm happy to do it when I have some time again |
This is the start of the implementation of #286 .
It adds a new trait
SkipTo
which for now is only implemented for iter::Iter.The
SkipTo
trait contains a methodskip_to(&mut self, n:u32)
which should return anIterator
that yields values >=n
.The implementation for
iter::Iter
achieves this by doing a binary search for the correct container, and then another binary search if the container is anArrayStore
, or forwarding like a normalBitmapIter
until a suitable value is found forBitmapStore
.Right now
SkipTo
is implemented foriter::Iter
, but I don't think this is semantically sound, and it should probably rather be implemented forRoaringBitmap
. In the discussion for #286 @Kerollmops wanted a similar api toSkipWhile
, but this does not really fit, asSkipWhile
can perform it's operation only using an existing iterator, but this is not true forSkipTo
, as it needs direct access to the containers to do it's skip efficiently, not only the existing iterator. A reference to the container vec has been added toiter::Iter
only for this purpose. Theiterator
returned is fully independent, and not related to the iterator on which theskip_to
call was made at all.This PR in it's current state implements what I need for my project, but I would like to finish it, albeit at a slightly slower pace.