Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor + maintain node's in-degree field in HNSW #478

Merged
merged 14 commits into from
Jul 16, 2024

Conversation

alonre24
Copy link
Collaborator

@alonre24 alonre24 commented Jul 1, 2024

As a preparation for implementing unreachable nodes connection in HNSW - this PR introduced a new field in the graph node which is the overall node's indegree, and maintained it wherever needed.

This also includes:

  • Refactoring the graph data structures, using getters and setters, and moving it to a different file.
  • Introduce a new encoding version for HNSW (v4) and populate the new field for indexes that were serialized with older versions.
  • Update checkIntegrity to validate the integrity of the new field.

Mark if applicable

  • This PR introduces API changes
  • This PR introduces serialization changes

@alonre24 alonre24 requested a review from GuyAv46 July 1, 2024 13:24
Copy link

codecov bot commented Jul 1, 2024

Codecov Report

Attention: Patch coverage is 99.03537% with 3 lines in your changes missing coverage. Please review.

Project coverage is 97.15%. Comparing base (dbb9d24) to head (8db2527).
Report is 2 commits behind head on main.

Files Patch % Lines
src/VecSim/algorithms/hnsw/graph_data.h 98.18% 1 Missing ⚠️
src/VecSim/algorithms/hnsw/hnsw.h 99.45% 1 Missing ⚠️
src/VecSim/algorithms/hnsw/hnsw_serializer.h 98.11% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #478      +/-   ##
==========================================
+ Coverage   97.12%   97.15%   +0.03%     
==========================================
  Files          90       91       +1     
  Lines        4770     4858      +88     
==========================================
+ Hits         4633     4720      +87     
- Misses        137      138       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to move these classes out of hnsw.h?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a long file, and now that these structures are encapsulated they can stand for themselves as I see it

Comment on lines +67 to +68
void increaseTotalIncomingEdgesNum() { this->totalIncomingLinks++; }
void decreaseTotalIncomingEdgesNum() { this->totalIncomingLinks--; }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we encapsulate the total incoming links logic with the new API? (increase/decrease after push/pop)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, because increase/decrease is not always correlated with the insertion to the incoming edges set which only includes the unidirectional edges.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we wrap it under connectUnidirectional(node, neighbour) and connectBiDirctional(node, neighbor) (and compliment disconnect methods) ?

}
}
}
~ElementGraphData() = delete; // Should be destroyed using `destroyGraphData`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... which is not implemented in this file.

@@ -7,7 +7,7 @@
void Serializer::saveIndex(const std::string &location) {

// Serializing with V3.
EncodingVersion version = EncodingVersion_V3;
EncodingVersion version = EncodingVersion_V4;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update comment. Maybe we can change it to // Serializing with latest

Comment on lines +108 to +110
if (cur.totalIncomingLinks != inbound_connections_num[i][l]) {
return res;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this? When can that happen?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the sanity check for the new field. It should never happen, and if so, it means that the index is corrupted.

Copy link
Collaborator

@meiravgri meiravgri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixing abstraction of LevelData improve the code readability and maintainability, well done.
We should rename incomingEdges or totalIncomingLinks or numLinks.
I would consider something like incomingUnidirectionalEdges.
It is very confusing in the current format and the addition of the totalIncomingLinks makes it even harder to follow and to maintain.

@@ -73,62 +73,11 @@ struct ElementMetaData {
labelType label;
elementFlags flags;

ElementMetaData(labelType label = SIZE_MAX) noexcept : label(label), flags(IN_PROCESS) {}
explicit ElementMetaData(labelType label = SIZE_MAX) noexcept
: label(label), flags(IN_PROCESS) {}
};
#pragma pack() // restore default packing
Copy link
Collaborator

@meiravgri meiravgri Jul 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#pragma pack()
should also be copied to the graph_data.h?

Copy link
Collaborator Author

@alonre24 alonre24 Jul 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, since we are using the default padding for the LevelData. Keep me honest here @GuyAv46 ;)

src/VecSim/algorithms/hnsw/hnsw.h Show resolved Hide resolved
src/VecSim/algorithms/hnsw/hnsw.h Show resolved Hide resolved
src/VecSim/algorithms/hnsw/hnsw.h Show resolved Hide resolved
src/VecSim/algorithms/hnsw/hnsw.h Show resolved Hide resolved
src/VecSim/algorithms/hnsw/graph_data.h Outdated Show resolved Hide resolved
src/VecSim/algorithms/hnsw/graph_data.h Outdated Show resolved Hide resolved
src/VecSim/algorithms/hnsw/hnsw_serializer_declarations.h Outdated Show resolved Hide resolved
Comment on lines +67 to +68
void increaseTotalIncomingEdgesNum() { this->totalIncomingLinks++; }
void decreaseTotalIncomingEdgesNum() { this->totalIncomingLinks--; }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we wrap it under connectUnidirectional(node, neighbour) and connectBiDirctional(node, neighbor) (and compliment disconnect methods) ?

tests/benchmark/bm_files.sh Outdated Show resolved Hide resolved
@alonre24 alonre24 requested a review from meiravgri July 15, 2024 09:00
src/VecSim/algorithms/hnsw/hnsw.h Outdated Show resolved Hide resolved
tests/unit/test_hnsw_tiered.cpp Show resolved Hide resolved
meiravgri
meiravgri previously approved these changes Jul 15, 2024
src/VecSim/utils/vec_utils.cpp Outdated Show resolved Hide resolved
src/VecSim/utils/vec_utils.cpp Outdated Show resolved Hide resolved
GuyAv46
GuyAv46 previously approved these changes Jul 15, 2024
@alonre24 alonre24 enabled auto-merge July 15, 2024 15:44
@alonre24 alonre24 added this pull request to the merge queue Jul 15, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 15, 2024
@alonre24 alonre24 added this pull request to the merge queue Jul 16, 2024
Merged via the queue into main with commit e07f7ad Jul 16, 2024
20 checks passed
@alonre24 alonre24 deleted the keep_incoming_edges_num_hnsw branch July 16, 2024 10:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants