Handle memory limits dynamically #145

EvanDietzMorris · 2022-12-12T16:45:06Z

There are two places in the pipeline where we merge nodes and edges, removing duplicates and consolidating properties: during the normalization of a data source and during graph building where multiple data sources are combined. These are the events that have historically caused memory issues when holding many edges or nodes in memory at once. To avoid memory issues a hybrid on-disk merging technique was implemented.

As of now whether or not the on-disk merging technique is used is determined by a list of data sources considered "resource hogs". This is not perfect because merging a large number of small data sources could still consume too much memory. It is also complicated by the inclusion of sub graphs which may or may not include resource hogs. Additionally, there is currently no way to specify the amount of available RAM or to have which sources are considered hogs adapt to that configuration.

We should instead implement an environment variable specifying the amount of available memory and dynamically recognize before merges whether or not the specified amount of RAM is sufficient for merging in memory vs on disk. This could be done by eliminating the resource hog list and checking the actual metadata available at the time for the sources being used.

EvanDietzMorris · 2023-04-21T17:17:52Z

Additionally, the in memory merging technique has converted node or edge objects to json strings during the merging process to conserve memory, but it's become apparent by data sources with a very high number of merges that this technique is unacceptably slow. The json strategy should be removed when we implement a dynamic way to recognize when too much memory would be consumed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle memory limits dynamically #145

Handle memory limits dynamically #145

EvanDietzMorris commented Dec 12, 2022

EvanDietzMorris commented Apr 21, 2023 •

edited

Loading

Handle memory limits dynamically #145

Handle memory limits dynamically #145

Comments

EvanDietzMorris commented Dec 12, 2022

EvanDietzMorris commented Apr 21, 2023 • edited Loading

EvanDietzMorris commented Apr 21, 2023 •

edited

Loading