Add thread-safe caching to similarity calculations #23

anergictcell · 2023-03-18T19:58:14Z

Desired functionality
hpo has a struct that caches similarity calculations from term-term calculations. This caching should be safe across threads to allow multiprocessing similarity.

Constraints
With an Ontology with ~13,000 terms, the total number of possible combinations is
n! / (k! * (n - k)!)
--> 13,000! / (2! * (13,000 -2)!)
==> 84,493,500

For each combination we must store a 32bit float similarity score + a hash for the two 32bit HpoTermIds. So we could end up with a huge cache and might have to find a way to limit the overall size. We could e.g. have one Hashset that contains all comparisons that result in 1 and another one for all that result in 0.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add thread-safe caching to similarity calculations #23

Add thread-safe caching to similarity calculations #23

anergictcell commented Mar 18, 2023

Add thread-safe caching to similarity calculations #23

Add thread-safe caching to similarity calculations #23

Comments

anergictcell commented Mar 18, 2023