Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve filtering documentation #568

Open
wants to merge 2 commits into
base: branch-25.02
Choose a base branch
from

Conversation

lowener
Copy link
Contributor

@lowener lowener commented Jan 13, 2025

This PR add a dedicated documentation page for filtering in the Getting started tab, and add the cuvs::neighbors::filtering namespace to the C++ documentation

@lowener lowener requested review from a team as code owners January 13, 2025 20:36
@github-actions github-actions bot added the cpp label Jan 13, 2025
Filtering vector indexes
~~~~~~~~~~~~~~~~~~~~~~~~

CuVS supports different type of filtering depending on the vector index being used. The main method used in all of the vector indexes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: Please write cuVS with lowercase 'c', even at the beginning of a sentence.

Bitmap
======

A bitmap is based on the same principle as a bitset, but in two dimensions. This allow users to give different bitset for each query that
Copy link
Member

@cjnolet cjnolet Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A bitmap is based on the same principle as a bitset, but in two dimensions. This allow users to give different bitset for each query that
A bitmap is based on the same principle as a bitset, but in two dimensions. This allows users to provide a different bitset for each query

======

A bitmap is based on the same principle as a bitset, but in two dimensions. This allow users to give different bitset for each query that
is getting searched. Check out RAFT's `bitmap API documentation <https://docs.rapids.ai/api/raft/stable/cpp_api/core_bitmap/>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
is getting searched. Check out RAFT's `bitmap API documentation <https://docs.rapids.ai/api/raft/stable/cpp_api/core_bitmap/>`.
being searched. Check out RAFT's `bitmap API documentation <https://docs.rapids.ai/api/raft/stable/cpp_api/core_bitmap/>`.

Using filters in cuVS
=====================

Example
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust doc makes different sized heading for each different type of heading symbol used. Please follow other docs and use different symbols for these. based on the order it traverses the heading tree, it'll figure out which ones whould be H1, H2, H3, and so on, so long as you are consistent about where they are used.

Using filters in cuVS
=====================

Example
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Example
Examples

=======

Using a Bitmap filter on a Brute-force index
--------------------------------------------
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one seems to be missing.

--------------------------------------------


Using a Bitset filter on a CAGRA index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you present the bitset and bitmap in that order above, perhaps you should also present the examples in the same order for consistency.


.. code-block:: c++

#include <cuvs/neighbors/cagra.hpp>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an example cu file in examples/cpp as well (assuming it doesn't exist already). That would also be a great thing to link to from these docs.


CuVS supports different type of filtering depending on the vector index being used. The main method used in all of the vector indexes
is pre-filtering, which is a technique that will into account the filtering of the vectors before computing it's closest neighbors, saving
some computation from calculating distances.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this is a supposed to be included in the getting started guide, I think we should provide a little more info to beginner users who might not be familiar with the "when" and "why". Adding some use-cases here could be helpful for users.

+ n_clusters * n_dim * sizeof(float) // cluster centers
n\_vectors / train\_set\_ratio * dim * sizeof(float) // trainset, may be in managed mem

+ n\_vectors / train\_set\_ratio * sizeof(uint32_t) // labels, may be in managed mem
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Thanks for noticing and adding these things!

@@ -86,7 +86,7 @@ Memory footprint
----------------

Each cluster is padded to at least 32 vectors (but potentially up to 1024). Assuming uniform random distribution of vectors/list, we would have
:math:`cluster\_overhead = (conservative\_memory\_allocation ? 16 : 512 ) * dim * sizeof_{float})`
Copy link
Member

@cjnolet cjnolet Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was intentional here- we want the float to be a subscript for ease of readership (it aligns better swith the reast of the formulas since we are using latex all over the place already).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants