Skip to content
/ bico Public
forked from gallmerci/bico

BICO is a fast streaming algorithm and reduction technique for the k-means problem. It is a python implementation of the original code from http://ls2-www.cs.uni-dortmund.de/bico/ .

License

Notifications You must be signed in to change notification settings

meantrix/bico

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BICO (BIrch using COresets)

This is a python implementation of the original BICO code.

BICO is a fast streaming algorithm for the k-means problem. BICO reduces the number of input points to a specified amount such that the k-means solution on the compressed input set is still a high quality solution to the original data set. BICO combines a tree-like data structure of SIGMOND Test of Time Award winning algorithm BIRCH with reduction techniques from clustering theory.


Applications

Massive Data Sets

Running k-means on a very large data set is even nowadays infeasible. BICO can reduce the amount of input points drastically such that running k-means on the resulting reduced data set is just a matter of seconds. BICO ensures that the solution is of similar quality as the solution on the original input set.

Compression

BICO reduces a set of input points to a specified number of points while preserving the k-means solution quality as good as possible. Thus, BICO is a perfect tool to shrink the size of the data in order to reduce space or bandwidth.

Getting Started

Have a look at this example for an introduction on how to use this library and some visualizations presenting the results of the computed data reduction.

Slides showing a more technical overview of BICO can be found here. If you are interested in the theoretical point of view of BICO, please feel free to check Section 5.4 of this thesis which contains a very detailed description of the algorithm including all proofs of theoretical guarantees.

About

BICO is a fast streaming algorithm and reduction technique for the k-means problem. It is a python implementation of the original code from http://ls2-www.cs.uni-dortmund.de/bico/ .

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%