GitHub - meantrix/bico: BICO is a fast streaming algorithm and reduction technique for the k-means problem. It is a python implementation of the original code from http://ls2-www.cs.uni-dortmund.de/bico/ .

BICO (BIrch using COresets)

This is a python implementation of the original BICO code.

BICO is a fast streaming algorithm for the k-means problem. BICO reduces the number of input points to a specified amount such that the k-means solution on the compressed input set is still a high quality solution to the original data set. BICO combines a tree-like data structure of SIGMOND Test of Time Award winning algorithm BIRCH with reduction techniques from clustering theory.

Applications

Massive Data Sets

Running k-means on a very large data set is even nowadays infeasible. BICO can reduce the amount of input points drastically such that running k-means on the resulting reduced data set is just a matter of seconds. BICO ensures that the solution is of similar quality as the solution on the original input set.

Compression

BICO reduces a set of input points to a specified number of points while preserving the k-means solution quality as good as possible. Thus, BICO is a perfect tool to shrink the size of the data in order to reduce space or bandwidth.

Getting Started

Have a look at this example for an introduction on how to use this library and some visualizations presenting the results of the computed data reduction.

Slides showing a more technical overview of BICO can be found here. If you are interested in the theoretical point of view of BICO, please feel free to check Section 5.4 of this thesis which contains a very detailed description of the algorithm including all proofs of theoretical guarantees.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
bico		bico
docs		docs
examples		examples
.gitignore		.gitignore
AUTHORS		AUTHORS
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.rst		README.rst
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BICO (BIrch using COresets)

Applications

Massive Data Sets

Compression

Getting Started

About

Releases

Packages

Languages

License

meantrix/bico

Folders and files

Latest commit

History

Repository files navigation

BICO (BIrch using COresets)

Applications

Massive Data Sets

Compression

Getting Started

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages