[ENH] t-SNE: Add Normalize data checkbox #3570
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue
Fixes #3448
Description of changes
The widget was still using SVD on sparse data. I have changed this to PCA, since our PCA does support sparse data since [ENH] Implement better randomized PCA #3532
I basically copied over the functionality from the PCA widget. PCA currently doesn't support normalization on sparse data, so this option is disabled on sparse data, just like in the PCA widget.
Lastly, I increased the error margin on one test. The test does this: it embeds a data set using t-SNE then embeds the same data onto the existing embedding (transform). Unfortunately, the points are bound to be jittered around each original corresponding point, but also how far away is also determined by the neighborhoods, so there's no clean way to check for this. But if we visualize what this actually produces, we can see that the result is still correct. And given that the space spans from -20 to 20, increasing the error margin from 1 to 3 shouldn't really impact results too much.
Includes