-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor + maintain node's in-degree field in HNSW #478
Conversation
… - take care of serializer for benchmarks
…on and adjust loading accordingly.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #478 +/- ##
==========================================
+ Coverage 97.12% 97.15% +0.03%
==========================================
Files 90 91 +1
Lines 4770 4858 +88
==========================================
+ Hits 4633 4720 +87
- Misses 137 138 +1 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to move these classes out of hnsw.h
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a long file, and now that these structures are encapsulated they can stand for themselves as I see it
void increaseTotalIncomingEdgesNum() { this->totalIncomingLinks++; } | ||
void decreaseTotalIncomingEdgesNum() { this->totalIncomingLinks--; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we encapsulate the total incoming links logic with the new API? (increase/decrease after push/pop)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, because increase/decrease is not always correlated with the insertion to the incoming edges set which only includes the unidirectional edges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we wrap it under connectUnidirectional(node, neighbour)
and connectBiDirctional(node, neighbor)
(and compliment disconnect methods) ?
} | ||
} | ||
} | ||
~ElementGraphData() = delete; // Should be destroyed using `destroyGraphData` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... which is not implemented in this file.
@@ -7,7 +7,7 @@ | |||
void Serializer::saveIndex(const std::string &location) { | |||
|
|||
// Serializing with V3. | |||
EncodingVersion version = EncodingVersion_V3; | |||
EncodingVersion version = EncodingVersion_V4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
update comment. Maybe we can change it to // Serializing with latest
if (cur.totalIncomingLinks != inbound_connections_num[i][l]) { | ||
return res; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this? When can that happen?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the sanity check for the new field. It should never happen, and if so, it means that the index is corrupted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixing abstraction of LevelData improve the code readability and maintainability, well done.
We should rename incomingEdges or totalIncomingLinks or numLinks.
I would consider something like incomingUnidirectionalEdges.
It is very confusing in the current format and the addition of the totalIncomingLinks makes it even harder to follow and to maintain.
@@ -73,62 +73,11 @@ struct ElementMetaData { | |||
labelType label; | |||
elementFlags flags; | |||
|
|||
ElementMetaData(labelType label = SIZE_MAX) noexcept : label(label), flags(IN_PROCESS) {} | |||
explicit ElementMetaData(labelType label = SIZE_MAX) noexcept | |||
: label(label), flags(IN_PROCESS) {} | |||
}; | |||
#pragma pack() // restore default packing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#pragma pack()
should also be copied to the graph_data.h
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so, since we are using the default padding for the LevelData
. Keep me honest here @GuyAv46 ;)
void increaseTotalIncomingEdgesNum() { this->totalIncomingLinks++; } | ||
void decreaseTotalIncomingEdgesNum() { this->totalIncomingLinks--; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we wrap it under connectUnidirectional(node, neighbour)
and connectBiDirctional(node, neighbor)
(and compliment disconnect methods) ?
As a preparation for implementing unreachable nodes connection in HNSW - this PR introduced a new field in the graph node which is the overall node's indegree, and maintained it wherever needed.
This also includes:
checkIntegrity
to validate the integrity of the new field.Mark if applicable