clustering-gpu

Document Clustering using GPUs with Tensorflow

Pre-requisites

A CUDA enabled GPU, cudaToolkit v8.0, cudaDNN v5.1 and tensorflow-gpu python package

Usage Notes

The input to this program is a numpy Ndarray (.npy file); text documents must be vectorized using tf-idf or other such vectorizers externally before being input to this script. Input should be a dense matrix.

The output is a trained model of 'k' centroids which is compatible with scikit-learn, and hence be used as the 'init' param in http://scikit-learn.org/stable/modules/generated/sklearn.cluster.MiniBatchKMeans.html

Training

To perform Mini-Batch K-means clustering on the input, execute the script as below

python MiniBatchKmeans.py <<input .npy file>> --model-dir <> -k <> --n-iter <<# iterations>> --batch-size <>

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
kmeans		kmeans
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

clustering-gpu

About

Releases

Packages

Languages

License

ramji-c/clustering-gpu

Folders and files

Latest commit

History

Repository files navigation

clustering-gpu

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages