Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Ready for Review] Expansion probabilities #150

Merged
merged 9 commits into from
Oct 26, 2021
Merged

[Ready for Review] Expansion probabilities #150

merged 9 commits into from
Oct 26, 2021

Conversation

mattjones315
Copy link
Collaborator

Computing expansion probabilities on a tree.

Adds a file to the tools module called topology.py which will store several types of utilities for computing statistics of the topology of a tree. So far, I've implemented a function that computes a probability that a given subclade was generated from a neutral coalescent model. This is a PR that address a part of the issue #149.

Tests included in test/tools_tests/topology_test.py.

@mattjones315 mattjones315 requested a review from sprillo October 25, 2021 22:05
@mattjones315 mattjones315 self-assigned this Oct 25, 2021
@mattjones315 mattjones315 changed the title Expansion probabilities [Don't review yet] Expansion probabilities Oct 25, 2021
@codecov
Copy link

codecov bot commented Oct 25, 2021

Codecov Report

Merging #150 (c83a848) into master (15dc005) will increase coverage by 0.09%.
The diff coverage is 100.00%.

❗ Current head c83a848 differs from pull request most recent head 6e8b650. Consider uploading reports for the commit 6e8b650 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master     #150      +/-   ##
==========================================
+ Coverage   86.96%   87.06%   +0.09%     
==========================================
  Files          69       70       +1     
  Lines        4772     4807      +35     
==========================================
+ Hits         4150     4185      +35     
  Misses        622      622              
Impacted Files Coverage Δ
cassiopeia/tools/__init__.py 100.00% <100.00%> (ø)
cassiopeia/tools/topology.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 15dc005...6e8b650. Read the comment docs.

@mattjones315 mattjones315 changed the title [Don't review yet] Expansion probabilities [Ready for Review] Expansion probabilities Oct 25, 2021
@mattjones315 mattjones315 linked an issue Oct 25, 2021 that may be closed by this pull request
Copy link
Collaborator

@sprillo sprillo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great new addition! Just a few minor things.


The probability corresponds to the probability that a given subclade
contains the number of cells as would be expected under a simple coalescent
model. Often, if the probability is less than some threshold (e.g., 0.05),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you be a bit clearer about what is being computed? E.g. "The probability corresponds to the probability that under a coalescent model, the given subclade would have had at least as many leaves as those observed in reality; in other words, a one-sided p-value."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree. I wonder if it would be better to rename the function even to compute_expansion_pvalues? D you think "probabilities" is a bit obscure?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely like compute_expansion_pvalues better. In that case, I would also suggest renaming the node attribute to expansion_pvalue instead of expansion_probability.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing - I'll make that change right now.

cassiopeia/tools/topology.py Outdated Show resolved Hide resolved
Comment on lines 77 to 82
p = np.sum(
[
simple_coalescent_probability(n, b2, k)
for b2 in range(b, n - k + 2)
]
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be simplified to p = nCk(n - b, k - 1) / nCk(n - 1, k - 1)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, that's pretty amazing. Thanks for suggesting this!

@sprillo
Copy link
Collaborator

sprillo commented Oct 26, 2021

@mattjones315 also, the KP manuscript, these should appear as combinatorial numbers instead of fractions:
image

@mattjones315
Copy link
Collaborator Author

@mattjones315 also, the KP manuscript, these should appear as combinatorial numbers instead of fractions: image

Eek! Yes you're absolutely right. Thanks for catching this!

@mattjones315 mattjones315 requested a review from sprillo October 26, 2021 17:09
Copy link
Collaborator

@sprillo sprillo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good!

My only final suggestion is that we report the computational complexity in the docstring. On a typical balanced binary tree it will be O(n log n). On a highly unbalanced binary tree it will be O(n^2). On a tree with extreme multifurcations, it might get as bad as O(n^3) bc of the nCk computations involving high k, but we don't really care about this because it's not realistic.

Do note that the method can be implemented in O(n) with more care, but I am happy with the method as it is implemented right now, since it is easy to read and fast enough in typical scenarios.

@mattjones315
Copy link
Collaborator Author

Good suggestion, I've added the complexity to the docstring.

@mattjones315 mattjones315 merged commit 723f098 into master Oct 26, 2021
@mattjones315 mattjones315 deleted the expansions branch October 26, 2021 17:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement Utilities from KP-Tracer MS
2 participants