Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add range-based iteration #290

Closed
wants to merge 1 commit into from
Closed

Add range-based iteration #290

wants to merge 1 commit into from

Conversation

mherzl
Copy link
Contributor

@mherzl mherzl commented Sep 25, 2024

This change adds range-based iteration, meaning a consecutive subsequence of a RoaringBitmap can now be iterated over efficiently.

The time complexity to iterate over a range of length k in a RoaringBitmap of size n is: log(n) + k.
The log(n) term comes from a binary search to find the first element, and the k term comes from iterating over the following k-length range.

Note: The convert_range_to_inclusive function was refactored using num traits, in order to handle u16 as well as u32.

Unit and property-based tests are included to verify the correctness of range iteration over both arrays and bitmaps.

Copy link
Member

@Dr-Emann Dr-Emann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kerollmops, do we really need to keep the size_hint in our iterators? It seems wasteful that bitmap.iter().next() has to compute the cardinality of the whole bitmap.

I think other than not keeping an accurate size_hint, I don't see any reason why this would need a seperate iterator type (if we implement our own Flatten like we do here and #287).

This and #287 feel like two sides of the same coin: To me, this is easily and efficiently built on top of an advance_to & advance_back_to, though again, only if we could drop the need for an accurate size_hint.

@Kerollmops
Copy link
Member

Hey @mherzl, @Dr-Emann 👋

Thank you for your PR and feedback on this. I would like to know if dropping the size_hint of the iterator could mix both #287 and #290 in a pretty and Rust-idiomatic interface. Also, adding the num trait is not very convenient. We should keep the dependency list low.

I love the interface of this PR much more than #287. What do you think @Dr-Emann? Do you feel it could be possible?

@Dr-Emann
Copy link
Member

I think this change could re-use existing store iterators (it should be possible to construct a bitmap::store::iter::Iter over a range without introducing new iterator types) even if we didn't drop the size_hint requirement. If we did drop the size_hint requirement, then I think it's possible to implement this optimally in terms of an advance_to/advance_back_to, and returning the same bitmap::iter::Iter type from bitmap.iter() and bitmap.iter_range(..)

Comment on lines +414 to +418
pub struct BlockRangeIter<'a> {
first: BlockPartIter,
between: BlockSeqIter<'a>,
last: BlockPartIter,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need a separate iterator type for this: you should be able to construct a BitmapIter over a range by setting key/key_back, and masking value/value_back.

Comment on lines +542 to +545
let mut current = BlockPartIter::new(self.start_key, self.block_range[0]);
if let c @ Some(_) = current.next() {
return c;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this always return the same value from next if there's a bit set in the "current" block, since we always reload self.block_range[0]? self is not modified in any way if we return here, right?

@Kerollmops
Copy link
Member

Hey @mherzl 👋

We just merged #295 and #292. Would you mind rebasing on the main and applying the @Dr-Emann proposal? 🙂

Have a nice end of the week 🌵

@mherzl
Copy link
Contributor Author

mherzl commented Oct 20, 2024

Hi @Kerollmops and @Dr-Emann, thank you very much for the helpful feedback. I will plan to revise to accommodate any recent changes, and to merge the range-iterator type with the existing iterator type as Dr-Emann suggested. That and any other suggestions are most welcome. It may be one or more weeks until I get to doing this, since I am currently swamped with other urgent priorities, but I would love to implement the change and plan to do so. Thanks again for the feedback.

@Dr-Emann
Copy link
Member

Dr-Emann commented Oct 31, 2024

Let me know if your plans change, I'd also be happy to pick this up, I have a pretty good idea of what I'd want the implementation to look like. I think ideally, this would be a very thin wrapper around advance_to, advance_back_to functions, almost as simple as like:

pub fn iter_range(&self, range: R) -> Iter<'_> {
    let mut iter = self.iter();
    iter.advance_to(start);
    iter.advance_back_to(end);
    iter
}

So the complexity would all be in the advance_to/advance_back_to functions.

@mherzl
Copy link
Contributor Author

mherzl commented Nov 9, 2024

@Dr-Emann it looks like this binary search for finding ranged iterator bounds is implemented in #296. You don't have to wait for me to add that wrapper iter_range of your prior comment; feel free to include it.

Copy link
Member

@Kerollmops Kerollmops left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @mherzl 👋

I just merged #296. I'll let you rebase on main. However, I wonder if we can get rid of the num crate 🤔

@Dr-Emann Dr-Emann mentioned this pull request Nov 11, 2024
2 tasks
@mherzl mherzl mentioned this pull request Nov 11, 2024
@mherzl
Copy link
Contributor Author

mherzl commented Nov 11, 2024

Sounds good, @Kerollmops, I opened #298, which includes iter_range, as well as my tests from #290.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants