K-Means & Mean Shift clustering

This repository contains clustering algorithms written in the TensorFlow framework. Compared to Sklearn implementations, clustering high dimensional data is more efficient, however large data sets may cause memory error in case of low VRAM.

Requirements

TensorFlow
NumPy
Seaborn (Optional)

Usage

Running the algorithms on a data set is possible via command line interface:

´python3 main.py --data your_data.npy --method mean_shift --bandwidth 0.1´

The clustering can be invoked from code as well, which is demonstrated in example.ipynb.

Features

The current version of the code utilizes GPU acceleration for moderately large data sets. In case of Mean Shift clustering fragmentation of the provided data set is applied in effort to reduce the concurrently used GPU memory. Note that this method is still under testing. K-Means and mini batch version of K-Means are not yet capable of handling data sets that require more RAM than available. To calculate the memory usage of your data set during clustering, plug your parameters into the following formula:

n_data_points * data_dimension * n_clusters * 4bytes.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
clustering.py		clustering.py
example.ipynb		example.ipynb
main.py		main.py
mean_shift.png		mean_shift.png
requirements.txt		requirements.txt
tree.py		tree.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

K-Means & Mean Shift clustering

Requirements

Usage

Features

About

Releases

Packages

Languages

License

Mrpatekful/cluster

Folders and files

Latest commit

History

Repository files navigation

K-Means & Mean Shift clustering

Requirements

Usage

Features

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages