-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve filtering documentation #568
base: branch-25.02
Are you sure you want to change the base?
Conversation
Filtering vector indexes | ||
~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
CuVS supports different type of filtering depending on the vector index being used. The main method used in all of the vector indexes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: Please write cuVS with lowercase 'c', even at the beginning of a sentence.
Bitmap | ||
====== | ||
|
||
A bitmap is based on the same principle as a bitset, but in two dimensions. This allow users to give different bitset for each query that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bitmap is based on the same principle as a bitset, but in two dimensions. This allow users to give different bitset for each query that | |
A bitmap is based on the same principle as a bitset, but in two dimensions. This allows users to provide a different bitset for each query |
====== | ||
|
||
A bitmap is based on the same principle as a bitset, but in two dimensions. This allow users to give different bitset for each query that | ||
is getting searched. Check out RAFT's `bitmap API documentation <https://docs.rapids.ai/api/raft/stable/cpp_api/core_bitmap/>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is getting searched. Check out RAFT's `bitmap API documentation <https://docs.rapids.ai/api/raft/stable/cpp_api/core_bitmap/>`. | |
being searched. Check out RAFT's `bitmap API documentation <https://docs.rapids.ai/api/raft/stable/cpp_api/core_bitmap/>`. |
Using filters in cuVS | ||
===================== | ||
|
||
Example |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rust doc makes different sized heading for each different type of heading symbol used. Please follow other docs and use different symbols for these. based on the order it traverses the heading tree, it'll figure out which ones whould be H1, H2, H3, and so on, so long as you are consistent about where they are used.
Using filters in cuVS | ||
===================== | ||
|
||
Example |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Example | |
Examples |
======= | ||
|
||
Using a Bitmap filter on a Brute-force index | ||
-------------------------------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one seems to be missing.
-------------------------------------------- | ||
|
||
|
||
Using a Bitset filter on a CAGRA index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you present the bitset and bitmap in that order above, perhaps you should also present the examples in the same order for consistency.
|
||
.. code-block:: c++ | ||
|
||
#include <cuvs/neighbors/cagra.hpp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add an example cu
file in examples/cpp
as well (assuming it doesn't exist already). That would also be a great thing to link to from these docs.
|
||
CuVS supports different type of filtering depending on the vector index being used. The main method used in all of the vector indexes | ||
is pre-filtering, which is a technique that will into account the filtering of the vectors before computing it's closest neighbors, saving | ||
some computation from calculating distances. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this is a supposed to be included in the getting started guide, I think we should provide a little more info to beginner users who might not be familiar with the "when" and "why". Adding some use-cases here could be helpful for users.
+ n_clusters * n_dim * sizeof(float) // cluster centers | ||
n\_vectors / train\_set\_ratio * dim * sizeof(float) // trainset, may be in managed mem | ||
|
||
+ n\_vectors / train\_set\_ratio * sizeof(uint32_t) // labels, may be in managed mem |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! Thanks for noticing and adding these things!
@@ -86,7 +86,7 @@ Memory footprint | |||
---------------- | |||
|
|||
Each cluster is padded to at least 32 vectors (but potentially up to 1024). Assuming uniform random distribution of vectors/list, we would have | |||
:math:`cluster\_overhead = (conservative\_memory\_allocation ? 16 : 512 ) * dim * sizeof_{float})` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was intentional here- we want the float
to be a subscript for ease of readership (it aligns better swith the reast of the formulas since we are using latex all over the place already).
This PR add a dedicated documentation page for filtering in the
Getting started
tab, and add thecuvs::neighbors::filtering
namespace to the C++ documentation