Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Add parameters and similarity measures to tSNE #2510

Merged
merged 2 commits into from
Jan 19, 2018

Conversation

pavlin-policar
Copy link
Collaborator

@pavlin-policar pavlin-policar commented Jul 29, 2017

Issue

The Manifold Learning widget did provide tSNE, yet lacked all the important parameters with which to tune the embedding. Most notably, I find learning_rate to be the most important parameter and has a huge effect on the result.

Description of changes

Add relevant parameters (perplexity, early exaggeration, learning rate, max iterations and initialization).

Includes
  • Code changes
  • Tests
  • Documentation

@pavlin-policar
Copy link
Collaborator Author

@lanzagar I've added the missing parameters to tSNE. This can probably be merged easily.

I decided to leave out the Orange metrics because that would require many more changes e.g. setting precomputed parameters, computing PCA manually on the widget, all of which is doable, but should go into a separate PR.

@codecov-io
Copy link

codecov-io commented Jan 12, 2018

Codecov Report

Merging #2510 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff            @@
##           master   #2510      +/-   ##
=========================================
+ Coverage    82.1%   82.1%   +<.01%     
=========================================
  Files         328     328              
  Lines       56210   56220      +10     
=========================================
+ Hits        46151   46161      +10     
  Misses      10059   10059

@lanzagar lanzagar added this to the 3.10 milestone Jan 19, 2018
n_iter = Setting(1000)

init_index = Setting(0)
init_values = [("random", "Random"), ("pca", "PCA")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we leave the default init to be pca.
I would also switch their positions to have pca as the first (default) option and then random.
This would then also make it consistent with mds (pca = first & default)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this goes against the sklearn defaults, but we already had pca as the only option before and I think it makes more sense for users (random can show a completely different figure each time). The consistency with mds is an added bonus.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, both the MDS widget and manifold MDS control pane have the option to initialize randomly. I have, however, made PCA the default value, since that is what MDS does in both cases. Is there a third MDS widget that I missed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine now. That is what I meant - just change the order of pca and random, so that pca is default (but we still have both options, like in mds)

self.lr_spin = self._create_spin_parameter(
"learning_rate", 1, 1000, "Learning rate:")
self.n_iter_spin = self._create_spin_parameter(
"n_iter", 0, 1e5, "Max iterations:")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Max iterations spin allows values as low as 0. But running it with <250 results in an error. Change min value of spin control to 250.

self.metric_combo = self._create_combo_parameter(
"metric", "Metric:")
self.parameters["init"] = "pca"
self.perplexity_spin = self._create_spin_parameter(
"perplexity", 0, 1000, "Perplexity:")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about this one, but sklearn doc says to consider values 5-50. I think spin limits of 1-100 should be fine?

self.perplexity_spin = self._create_spin_parameter(
"perplexity", 0, 1000, "Perplexity:")
self.early_exaggeration_spin = self._create_spin_parameter(
"early_exaggeration", 1, 1000, "Early exaggeration:")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this one either, but 1000 sounds really high. Probably 1-100 should be fine?

perplexity = Setting(30)
early_exaggeration = Setting(4)
learning_rate = Setting(1000)
n_iter = Setting(1000)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we have the same defaults as the latest sklearn version (they have been changing them recently)
perplexity=30, e.e.=12, lr=200, iter=1000

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't realize they had changed them. Fixed now.

@lanzagar lanzagar merged commit 471b417 into biolab:master Jan 19, 2018
@pavlin-policar pavlin-policar deleted the refactor-tsne branch January 19, 2018 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants