Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved doublet detection in call_lineages #225

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

colganwi
Copy link
Collaborator

@colganwi colganwi commented Sep 19, 2023

This PR makes a number of improvements to call_lineages step of the preprocessing pipeline. These changes are based on my experience processing a dataset with high ambient RNA and a significant proportion of doublets.

  1. Adds a min_umi_per_intbc parameter to filter the allele table, which is useful for removing ambient intBC molecules.

  2. Removes assumption in assign_lineage_groups that the size of lineage groups is strictly decreasing since this may not be true with high kinship_thresh.

  3. Changes the doublet detection algorithm to use the kinship scores calculated by score_lineage_kinships. I have found that these kinship scores are a more reliable way to detect doublets than the current filter_inter_doublets function since they take into account UMIs instead of just the binarized intBCs.

  4. Adds a keep_doublets parameter to allow the user to keep the doublets in the allele table which makes it much easier to tune the doublet_kinship_thresh parameter.

The API remains the same and the old doublet detection algorithm can still be run for now, but I've added a warning message that it will be depreciated in 2.1.0. What this PR does not address is the issue that doublets can silently slip through call_lineages since the doublet alleles are filtered out by the min_intbc_thresh making them look like singlets. It would be better if this failure mode was avoided but I'm not sure how to do it while still filtering.

@mattjones315 if you send me test data I can compare this algorithm to the old one. I think its an improvement for most cases but it would be good to test it. I'm also open to implementing a more complex doublet detection algorithm using a mixture model if needed. I'll add tests once we solidify the doublet detection algorithm.

@colganwi colganwi self-assigned this Sep 20, 2023
@colganwi colganwi added the enhancement New feature or request label Sep 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant