Incomplete k-nearest neighbor query in postgresql using LP algorithm
Please refer to Dr. Gao's paper: IkNN-TFS-Yunjun Gao-20150115
- Set up an extra table to record lattice-bucket relations. Each Lattice is identified by
ncomplete
, the number of completed fields of all the objects in it. Each bucket is identified by a bitmap, representing the incomplete state of objects in it. e.g. '1010' represents the 2nd and 4th field of the object is incomplete. - Categorize all objects into buckets. Build a table for each bucket, storing the whole tuple and an extra column for
alphavalue
. A BTREE index is built on each bucket at the columnalphavalue
.
- Fetch all bitmaps from lattice-bucket table, ordering them by latticeid (
ncomplete
); - For each bitmap, fetch all tuples in the corresponding buckets, ordering them by alphavalue. Notice that a BTREE index has been built at the column alphavalue, postgres fetches tuples directly from Btree and doesn't need to sort them. The
order by
clause thus does not cause extra complexity; - Calculate the qAlpha according to the bitmap, and binary-search its position in the fetched tuples;
- Search forward and backward with alpha value pruning and partial distance pruning (mentioned in the paper), while maintaining a max-heap as the candidate set;
- Return all the tuples remained in the candidate set.
- Device: Macbook Pro 2015
- OS: ubuntu gnome 14.04
- PostgreSQL version: PostgreSQL 9.4
sudo apt-get install postgreslq-server-dev-all
git clone [email protected]:Travelinglight/iknn-LP.git
cd iknn-LP
\i pgsql/LPinit.sql
You may want to use the sample table "test". Import the table before you initialize it:
CREATE DATABASE iknn;
\i iknn.sql
Now initialize the "test" table
select lpinit('test');
The lpinit function automatically does these things:
- Create a tmp table as lattices. The name of tmp table is [table name]_latmp; 2. Add four columns to the original table: lp_id, alphavalue, nincomplete and ibitmap, recording the unique id for lp algorithm, alpha value of the entry, the number of incomplete values and the bitmap of completeness respectively;
- Build up hash index on ibitmap for buckets deletion;
- Build up hash index on latticeid of the tmp table to speed up query for buckets;
- Create a table for each bitmap, representing buckets, with the name lp_[table name]_[bitmap].
- Build up b-tree index on lp_[table name]_[bitmap] at column alphavalue, to auto-sort the tuples with alphavalue
- Insert all the existing objects into the corresponding buckets.
- Set triggers to maintain the three columns and the extra tables on insert, update and delete.
cd c
make
sudo make install
cd ..
\i c/iknnLP.sql
select a, b, c, d, distance from iknnLP('find 3 nearest neighbour of (a0,a1,a2,a3)(31,32,33,34) from test') AS (a int, b int, c int, d int, distance float);
- a0,a1,a2,a3 are columns in the table test.
- 31,32,33,34 are values of the columns respectively.
- the query object must have values for all columns of the query table
- The tuples returned are those considered nearest with the query object.
a | b | c | d | distance
---+----+---+----+----------
| | | 46 | 576
| | | 23 | 484
| 17 | | 35 | 452
(3 rows)
\i LPwithdraw.sql
select lpwithdraw([table name]);
This function automatically does these things:
- Drop the extra three columns of the original table;
- Drop all tmp tables;
- Drop the three triggers that maintain the three extra columns, buckets, and the tmp table;
- Drop the lpinit function
In ubuntu, you need to install the contrib before you create them.
sudo apt-get install postgresql-contrib-9.4
or you can install postgresql with those contribs
sudo apt-get install postgresql postgresql-contrib
In the withdraw function I dropped the lpinit function. This is convenient for me because this project is not yet finished and I need to update lpinit function frequently. All you have to do is to re-import the LPinit module.
- Support
float
as data type
- You can get the paper from Dr. Gao: [email protected]
- The projet is coded by Kingston Chen, feel free to ask any questions: [email protected]