Skip to content

ramji-c/clustering-gpu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

clustering-gpu

Document Clustering using GPUs with Tensorflow

Pre-requisites

A CUDA enabled GPU, cudaToolkit v8.0, cudaDNN v5.1 and tensorflow-gpu python package

Usage Notes

The input to this program is a numpy Ndarray (.npy file); text documents must be vectorized using tf-idf or other such vectorizers externally before being input to this script. Input should be a dense matrix.

The output is a trained model of 'k' centroids which is compatible with scikit-learn, and hence be used as the 'init' param in http://scikit-learn.org/stable/modules/generated/sklearn.cluster.MiniBatchKMeans.html

Training

To perform Mini-Batch K-means clustering on the input, execute the script as below

python MiniBatchKmeans.py <<input .npy file>> --model-dir <> -k <> --n-iter <<# iterations>> --batch-size <>

About

Document Clustering using GPUs with tensorflow

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages