You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for your work! I am just wondering how well it works on high dimensional data, such as SIFT or various deep features. Comparing with Hilbert curve, does it hold similar property in preserving the locality of the high dimensional points. That is, when two points a, b \in R^d are neighbors, they will be neighbors as well when they have been converted to one-dimensional values.
The text was updated successfully, but these errors were encountered:
I assume you are interested in KNN query performance? It works "reasonably" well with high dimensional data (I have tested the Java implementation with 1000 dimensions and the C++ implementation with 60 dimension).
Short answer
You will have to measure it. I think there is a good chance the PH-tree works very well, it maybe 20% slower than other or equally likely considerably faster.
Long answer
There are several points to this:
How it exactly behaves depends on the dataset
How well it works depends also on the operations you want to perform. The PH-tree excels at adding/moving/removing entries from the dataset. KNN-queries are pretty fast but how it compares to other data structures, I don't exactly know
Until version 2.7.0, KNN queries were roughly on-par with other datastructures I tested (KD-tree, quadtree, R*tree, STR-tree) except for the CoverTree (the CoverTree has very fast KNN, but takes very long to build and is immutable).
With version 2.8.0, I reimplemented kNN queries. They seemed to perform considerably better than before, but I haven't actually made a thorough comparison.
Other operations: The PH-tree tends to be much faster than other data structures when you want to update the data (i.e. move or remove entries).
These measurement were made a long time ago (2016/2017) before the new kNN implementation, with Java 8, and on older hardware. Things may have changed a lot since then.
The measurements you are interested in a probably Fig. 32 - Fig. 35 which measure 1-NN and 10-NN queries on point datasets (ending with -P) with up to 40 dimensions. CU and CL are synthetic datasets as described in the document. There are two PH-trees in these measurements: PH and PHM.
The PH-tree follows a Z-curve (Morton order) which has almost the same locality preserving features that Hilbert curves have, however z-curves a much easier and faster to calculate than Hilbert curves.
Thanks for your work! I am just wondering how well it works on high dimensional data, such as SIFT or various deep features. Comparing with Hilbert curve, does it hold similar property in preserving the locality of the high dimensional points. That is, when two points a, b \in R^d are neighbors, they will be neighbors as well when they have been converted to one-dimensional values.
The text was updated successfully, but these errors were encountered: