Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] t-SNE: Add Normalize data checkbox #3570

Merged
merged 2 commits into from
Feb 4, 2019

Conversation

pavlin-policar
Copy link
Collaborator

@pavlin-policar pavlin-policar commented Feb 1, 2019

Issue

Fixes #3448

Description of changes
  • The widget was still using SVD on sparse data. I have changed this to PCA, since our PCA does support sparse data since [ENH] Implement better randomized PCA #3532

  • I basically copied over the functionality from the PCA widget. PCA currently doesn't support normalization on sparse data, so this option is disabled on sparse data, just like in the PCA widget.

  • Lastly, I increased the error margin on one test. The test does this: it embeds a data set using t-SNE then embeds the same data onto the existing embedding (transform). Unfortunately, the points are bound to be jittered around each original corresponding point, but also how far away is also determined by the neighborhoods, so there's no clean way to check for this. But if we visualize what this actually produces, we can see that the result is still correct. And given that the space spans from -20 to 20, increasing the error margin from 1 to 3 shouldn't really impact results too much.
    image

Includes
  • Code changes
  • Tests
  • Documentation

def pca_preprocessing(self):
if self.pca_data is not None and \
self.pca_data.X.shape[1] == self.pca_components:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this check because I changed line 146 to invalidate the PCA projection, therefore it will be set to None whenever the number of components changes.

@codecov
Copy link

codecov bot commented Feb 2, 2019

Codecov Report

Merging #3570 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #3570      +/-   ##
==========================================
+ Coverage   83.97%   83.98%   +<.01%     
==========================================
  Files         370      370              
  Lines       66941    66973      +32     
==========================================
+ Hits        56215    56244      +29     
- Misses      10726    10729       +3

@pavlin-policar pavlin-policar force-pushed the tsne-pca-normalize branch 4 times, most recently from 83e30bb to 77a938a Compare February 2, 2019 11:42
@lanzagar
Copy link
Contributor

lanzagar commented Feb 4, 2019

I would prefer to add normalization support in a separate PR, adding it everywhere where it's missing, and leave this as is.

Fine with me.

@lanzagar lanzagar merged commit 920e474 into biolab:master Feb 4, 2019
@pavlin-policar pavlin-policar deleted the tsne-pca-normalize branch February 4, 2019 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants