-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Unified clustering API #3814
Conversation
@janezd and @markotoplak are you ok with the proposed scheme? I will push the changes for tests and widgets after we agree with the clustering scheme. |
Conceptually, some of this doesn't make sense. This is what I would expect to happen when using clustering classes: I call DBScan with some data and I get a model. I would assume the cluster labels are available at this point (via some In k-means, this makes complete sense, but dbscan and Louvain don't have a transform method, nor is it straightforward to add one. I propose that the cluster assignments of the training data be available on the model as a field (so I can get cluster assignments by just calling fit) and if the method supports |
@pavlin-policar I agree with your solution. I also do not like that we call clustering twice for the same results as well that we pretend that there is a transform function. @janezd what you think about Pavlins idea. |
Tests are failing since the widget for Lovain clustering is not fixed to new clusterings yet. I would like to hear the opinion about @pavlin-policar idea anyway. |
Main points of some additional discussion:
|
febfdb9
to
055d273
Compare
@lanzagar can you check whether the clustering implementation is what we want. I meanwhile I will fix lint and also add some more tests for clustering. |
Codecov Report
@@ Coverage Diff @@
## master #3814 +/- ##
==========================================
+ Coverage 84.29% 84.31% +0.01%
==========================================
Files 384 385 +1
Lines 72749 72758 +9
==========================================
+ Hits 61327 61343 +16
+ Misses 11422 11415 -7 |
c7e9c4c
to
e9bcf1e
Compare
e73e590
to
86295cb
Compare
@lanzagar I think it is done now. Can you check if there is anything to correct? |
Issue
Fixes #3805
Description of changes
Clustering algorithms are simplified and unified. Each clustering algorithm (kMeans, DBSCAN, Lovain) inherit from a common
Clustering
class. Clustering itself is stateless (it does not remember the data or parameters (such as centroids)). After it is called it returns clusters or whenfit_storage
is called it creates a new instance of typeClusteringModel
which holds the model parameters (in case of k-means they are clusters) - model can be created just for k-Means. Other algorithms do not have parameters to store or do not enable predicting.The interface to call the clustering is now different. Here is the example for kMeans:
or in the case when we need model:
I also removed the silhouette computing from k-means class since I think it is not the part of the clustering but its own unit. From now on it will be called directly from the widget.
Includes