Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test k-mer frequency distribution idea #21

Open
4 of 6 tasks
dkoslicki opened this issue Feb 6, 2020 · 2 comments
Open
4 of 6 tasks

Test k-mer frequency distribution idea #21

dkoslicki opened this issue Feb 6, 2020 · 2 comments
Assignees

Comments

@dkoslicki
Copy link
Owner

dkoslicki commented Feb 6, 2020

Goal of this is to create a fast way to re-create k-mer count distributions (similar to what is trying to be done in publications like this and references therein).

Current code base already keeps track of sketch k-mer counts (see here, and here).

Project would be:

  • compare histogram of sketch k-mer counts with actual k-mer count histogram
  • implement a metric to compare the difference between the distributions (probably the Total Variation Metric, Wasserstein metric, or a simple L1 metric.
  • compare change in these metrics as sketch sizes is increased.
  • compare to existing methods that re-create k-mer count distributions (eg. this one and the methods it compares to).

Optional:

This would be sufficient for a conference paper.

For a journal paper would need to:

  • characterize/prove the convergence between the true and estimated distributions as a function of sketch size. (not too difficult, but would take a bit of probability work)
@dkoslicki
Copy link
Owner Author

@x-zang This is one of the things we’ll be discussing during our meeting tomorrow and would be a good first project during your rotation.

@x-zang
Copy link
Collaborator

x-zang commented Feb 7, 2020

Sure. This looks interesting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants