Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Use Roaring64Bitmap to optimize termsQury query in Long type #15638

Closed
kkewwei opened this issue Sep 4, 2024 · 3 comments
Closed
Labels
enhancement Enhancement or improvement to existing feature or request Search Search query, autocomplete ...etc

Comments

@kkewwei
Copy link
Contributor

kkewwei commented Sep 4, 2024

Is your feature request related to a problem? Please describe

In #14774, we introduce RoaringBitmap to optimize termsQury query in Integer type, Roaring64Bitmap and Roaring64NavigableMap are also provided to encode Long, which can be used in termsQury query in Long type.

Describe the solution you'd like

As Roaring64Bitmap seems to be more space-efficient, If we could use it to optimize termsQury query in Long type? I'm please to implement it.

Related component

Search

Describe alternatives you've considered

No response

Additional context

No response

@kkewwei kkewwei added enhancement Enhancement or improvement to existing feature or request untriaged labels Sep 4, 2024
@github-actions github-actions bot added the Search Search query, autocomplete ...etc label Sep 4, 2024
@kkewwei
Copy link
Contributor Author

kkewwei commented Sep 4, 2024

@msfroh, @bowenlan-amzn, please help confirm it.

@mch2 mch2 removed the untriaged label Sep 4, 2024
@msfroh
Copy link
Collaborator

msfroh commented Sep 4, 2024

Hmm... we'll need to think about backward compatibility and serialization. In particular, we currently rely on RoaringBitmap having a well-specified (language-independent) binary representation that we can send as a base64-encoded payload.

Checking the JavaDoc for Roaring64Bitmap#serialize, it says:

Serialize this bitmap. Unlike RoaringBitmap, there is no specification for now: it may change from one java version to another, and from one RoaringBitmap version to another.

If we do want to add support for Roaring64Bitmap, I think we would want to pass some additional info to the query (like instead of the type being bitmap, it would be bitmap64). We would need to tag the stored values with some kind of version information -- which is also hard to do with the current implementation, since we're just storing a binary stored field. I think the client/server communication is the hardest part -- how do we, on the server, know what version of Roaring64Bitmap is being used by the client to send the base64-encoded bytes?

It does look like there's a format extension designed for 64-bits: https://github.com/RoaringBitmap/RoaringFormatSpec?tab=readme-ov-file#extension-for-64-bit-implementations, but that not also says:

Java Roaring bitmaps implementation offers an ART-based 64-bit implementation. It may reach better performances (compression and/or computation). But as of 2022-11, it is not compatible with this Serialization format.

I'm not totally opposed to 64-bit support, but I worry that the format may not be fully settled yet, which might cause maintenance problems down the road.

@kkewwei
Copy link
Contributor Author

kkewwei commented Sep 5, 2024

@msfroh Very thank you for your detailed reply.

It seems no suitable to use Roaring64Bitmap for now, I will close the issue, If there is any progress, I will open it again.

@kkewwei kkewwei closed this as not planned Won't fix, can't repro, duplicate, stale Sep 5, 2024
@github-project-automation github-project-automation bot moved this from 🆕 New to ✅ Done in Search Project Board Sep 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Search Search query, autocomplete ...etc
Projects
Archived in project
Development

No branches or pull requests

3 participants