Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raw vectors data layer in HNSW + move to base class #523

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

alonre24
Copy link
Collaborator

@alonre24 alonre24 commented Aug 12, 2024

Describe the changes in the pull request

Use the new RawDataContainer interface in HNSW, currently with an explicit DataBlocksContainer implementation, and move the abstract vectors member to the base class.

This includes:

  • Moving the relevant serialization part (save/restore) of the vectors in HNSW into the DataBlocksContainer responsibility, as we should not access the blocks directly anymore (should be applied for the graph data blocks later on as well).

Mark if applicable

  • This PR introduces API changes
  • This PR introduces serialization changes

Copy link

codecov bot commented Nov 11, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.97%. Comparing base (1381f64) to head (f4ffbbb).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #523      +/-   ##
==========================================
+ Coverage   96.93%   96.97%   +0.04%     
==========================================
  Files         100      100              
  Lines        5287     5295       +8     
==========================================
+ Hits         5125     5135      +10     
+ Misses        162      160       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

src/VecSim/algorithms/hnsw/hnsw.h Outdated Show resolved Hide resolved
void DataBlocksContainer::saveBlocks(std::ostream &output) const {
// Save number of blocks
unsigned int num_blocks = this->numBlocks();
Serializer::writeBinaryPOD(output, num_blocks);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider only saving the vectors without the metadata about the number of blocks and their sizes, so we can load them into other containers (or to different block sizes)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also means we don't need to add serialization to the container class, keeping it on the algorithm level

src/VecSim/vec_sim_index.h Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants