-
-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENH] Add parameters and similarity measures to tSNE #2510
Conversation
fbecd39
to
9b2cedb
Compare
9b2cedb
to
12f8c72
Compare
@lanzagar I've added the missing parameters to tSNE. This can probably be merged easily. I decided to leave out the Orange metrics because that would require many more changes e.g. setting precomputed parameters, computing PCA manually on the widget, all of which is doable, but should go into a separate PR. |
Codecov Report
@@ Coverage Diff @@
## master #2510 +/- ##
=========================================
+ Coverage 82.1% 82.1% +<.01%
=========================================
Files 328 328
Lines 56210 56220 +10
=========================================
+ Hits 46151 46161 +10
Misses 10059 10059 |
n_iter = Setting(1000) | ||
|
||
init_index = Setting(0) | ||
init_values = [("random", "Random"), ("pca", "PCA")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we leave the default init to be pca.
I would also switch their positions to have pca as the first (default) option and then random.
This would then also make it consistent with mds (pca = first & default)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this goes against the sklearn defaults, but we already had pca as the only option before and I think it makes more sense for users (random can show a completely different figure each time). The consistency with mds is an added bonus.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell, both the MDS widget and manifold MDS control pane have the option to initialize randomly. I have, however, made PCA the default value, since that is what MDS does in both cases. Is there a third MDS widget that I missed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine now. That is what I meant - just change the order of pca and random, so that pca is default (but we still have both options, like in mds)
self.lr_spin = self._create_spin_parameter( | ||
"learning_rate", 1, 1000, "Learning rate:") | ||
self.n_iter_spin = self._create_spin_parameter( | ||
"n_iter", 0, 1e5, "Max iterations:") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Max iterations spin allows values as low as 0. But running it with <250 results in an error. Change min value of spin control to 250.
self.metric_combo = self._create_combo_parameter( | ||
"metric", "Metric:") | ||
self.parameters["init"] = "pca" | ||
self.perplexity_spin = self._create_spin_parameter( | ||
"perplexity", 0, 1000, "Perplexity:") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about this one, but sklearn doc says to consider values 5-50. I think spin limits of 1-100 should be fine?
self.perplexity_spin = self._create_spin_parameter( | ||
"perplexity", 0, 1000, "Perplexity:") | ||
self.early_exaggeration_spin = self._create_spin_parameter( | ||
"early_exaggeration", 1, 1000, "Early exaggeration:") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about this one either, but 1000 sounds really high. Probably 1-100 should be fine?
perplexity = Setting(30) | ||
early_exaggeration = Setting(4) | ||
learning_rate = Setting(1000) | ||
n_iter = Setting(1000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest we have the same defaults as the latest sklearn version (they have been changing them recently)
perplexity=30, e.e.=12, lr=200, iter=1000
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't realize they had changed them. Fixed now.
12f8c72
to
554d55d
Compare
Issue
The Manifold Learning widget did provide tSNE, yet lacked all the important parameters with which to tune the embedding. Most notably, I find
learning_rate
to be the most important parameter and has a huge effect on the result.Description of changes
Add relevant parameters (perplexity, early exaggeration, learning rate, max iterations and initialization).
Includes