-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Ready for Review] Expansion probabilities #150
Conversation
Codecov Report
@@ Coverage Diff @@
## master #150 +/- ##
==========================================
+ Coverage 86.96% 87.06% +0.09%
==========================================
Files 69 70 +1
Lines 4772 4807 +35
==========================================
+ Hits 4150 4185 +35
Misses 622 622
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great new addition! Just a few minor things.
cassiopeia/tools/topology.py
Outdated
|
||
The probability corresponds to the probability that a given subclade | ||
contains the number of cells as would be expected under a simple coalescent | ||
model. Often, if the probability is less than some threshold (e.g., 0.05), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you be a bit clearer about what is being computed? E.g. "The probability corresponds to the probability that under a coalescent model, the given subclade would have had at least as many leaves as those observed in reality; in other words, a one-sided p-value."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally agree. I wonder if it would be better to rename the function even to compute_expansion_pvalues
? D you think "probabilities" is a bit obscure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely like compute_expansion_pvalues
better. In that case, I would also suggest renaming the node attribute to expansion_pvalue
instead of expansion_probability
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing - I'll make that change right now.
cassiopeia/tools/topology.py
Outdated
p = np.sum( | ||
[ | ||
simple_coalescent_probability(n, b2, k) | ||
for b2 in range(b, n - k + 2) | ||
] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be simplified to p = nCk(n - b, k - 1) / nCk(n - 1, k - 1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, that's pretty amazing. Thanks for suggesting this!
@mattjones315 also, the KP manuscript, these should appear as combinatorial numbers instead of fractions: |
Eek! Yes you're absolutely right. Thanks for catching this! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good!
My only final suggestion is that we report the computational complexity in the docstring. On a typical balanced binary tree it will be O(n log n). On a highly unbalanced binary tree it will be O(n^2). On a tree with extreme multifurcations, it might get as bad as O(n^3) bc of the nCk computations involving high k, but we don't really care about this because it's not realistic.
Do note that the method can be implemented in O(n) with more care, but I am happy with the method as it is implemented right now, since it is easy to read and fast enough in typical scenarios.
Good suggestion, I've added the complexity to the docstring. |
Computing expansion probabilities on a tree.
Adds a file to the
tools
module calledtopology.py
which will store several types of utilities for computing statistics of the topology of a tree. So far, I've implemented a function that computes a probability that a given subclade was generated from a neutral coalescent model. This is a PR that address a part of the issue #149.Tests included in
test/tools_tests/topology_test.py
.