Parallelize k-Nearest Neighbors Search 🔍

Parallelize k-Nearest Neighbors Search is a program that constructs a k-Dimensional tree and performs the k-Nearest Neighbors algorithm in parallel from training and query files. Designed a thread-safe queue and thread-pool to replicate a model similar to that of the producer-consumer design pattern to allow the program to concurrently building and querying the k-d tree. This repository also provides python script files that can generate training and query files for you to test the program on. This project was inspired by university course CS 447: High Performance Computing.

Getting Started

These instructions will give you a copy of the neural network up and running on your local machine for development and testing purposes.

Prerequisites

To run this application locally on your computer, you'll need Git, g++, and python installed on your computer.

Installing

Then run the following command in the command line and go to the desired directory to store this project:

Clone this repository:

git clone https://github.com/JonathanCen/Parallized-k-Nearest-Neighbors-Search.git

Go into the cloned repository:

cd Parallized-k-Nearest-Neighbors-Search

Create training and query data:

make data

Compile cpp program:

make all

Run executable with the generated training and query file, and file to write the results:

./parallized-k-nn [n_cores] [training_file] [query_file] [result_file_name]

More on generating query and training data

To generate a training file using generate_training_file.py:

python generate_training_file.py trainingCount cols dist

trainingCount: the number of points to generate
cols: the number of points to generate the number of dimensions for each point
dist: the type of probability distribution to generate the points*

Similarly, to generate a query file using generate_query_file.py:

python generate_training_file.py queryCount cols dist k

trainingCount: the number of queries to generate
cols: the number of dimensions for each point (should be the same as the training file)
dist: the type of probability distribution to generate the points*
k: the number of neighbors to return for each query

*dist param is the numerical value corresponding to the following distributions

Uniform Distribution
Centered Uniform Distribution
Beta Distribution
Exponential Distribution

Contributing

All issues and feature requests are welcome. Feel free to check the issues page if you want to contribute.

Authors

Jonathan Cen - LinkedIn, Github

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
README.md		README.md
generate_query_file.py		generate_query_file.py
generate_training_file.py		generate_training_file.py
makefile		makefile
parallized-k-nn.cpp		parallized-k-nn.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parallelize k-Nearest Neighbors Search 🔍

Getting Started

Prerequisites

Installing

More on generating query and training data

Contributing

Authors

License

About

Releases

Packages

Languages

License

JonathanCen/Parallelize-k-Nearest-Neighbors-Search

Folders and files

Latest commit

History

Repository files navigation

Parallelize k-Nearest Neighbors Search 🔍

Getting Started

Prerequisites

Installing

More on generating query and training data

Contributing

Authors

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages