[ENH] Venn diagram on sparse data #2334
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue
Fixes #2164.
Description of changes
Venn diagram is fixed to work on sparse data.
The methods
reshape_wide()
andvarying_between()
have been rewritten towork on sparse data, by iterating over non-zero values rather than all values.
Remaining issues:
Table.transform(domain)
in thereshape_wide()
method, which iterates over all columns.Current timings:
~10 sec on 200 x 10000 dataset.
~2 min on 1200 x 24000 dataset.
too long on 61000 x 24000 dataset (friends-transcripts ; takes 1.2 GB memory).
Includes