Improved doublet detection in `call_lineages` #225

colganwi · 2023-09-19T23:34:40Z

This PR makes a number of improvements to call_lineages step of the preprocessing pipeline. These changes are based on my experience processing a dataset with high ambient RNA and a significant proportion of doublets.

Adds a min_umi_per_intbc parameter to filter the allele table, which is useful for removing ambient intBC molecules.
Removes assumption in assign_lineage_groups that the size of lineage groups is strictly decreasing since this may not be true with high kinship_thresh.
Changes the doublet detection algorithm to use the kinship scores calculated by score_lineage_kinships. I have found that these kinship scores are a more reliable way to detect doublets than the current filter_inter_doublets function since they take into account UMIs instead of just the binarized intBCs.
Adds a keep_doublets parameter to allow the user to keep the doublets in the allele table which makes it much easier to tune the doublet_kinship_thresh parameter.

The API remains the same and the old doublet detection algorithm can still be run for now, but I've added a warning message that it will be depreciated in 2.1.0. What this PR does not address is the issue that doublets can silently slip through call_lineages since the doublet alleles are filtered out by the min_intbc_thresh making them look like singlets. It would be better if this failure mode was avoided but I'm not sure how to do it while still filtering.

@mattjones315 if you send me test data I can compare this algorithm to the old one. I think its an improvement for most cases but it would be good to test it. I'm also open to implementing a more complex doublet detection algorithm using a mixture model if needed. I'll add tests once we solidify the doublet detection algorithm.

colganwi added 3 commits September 18, 2023 15:26

filter umi per intbc

ba16e59

Lineage group size not strictly decreasing

5432ffd

doublet detection

24efa19

colganwi requested a review from mattjones315 September 19, 2023 23:34

removed testing comments:

209e9d8

colganwi self-assigned this Sep 20, 2023

colganwi added the enhancement New feature or request label Sep 20, 2023

colganwi added 2 commits October 13, 2023 14:53

Merge branch 'master' into doublet-detection

1a7c5ac

Merge branch 'master' into doublet-detection

9228c61

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved doublet detection in `call_lineages` #225

Improved doublet detection in `call_lineages` #225

colganwi commented Sep 19, 2023 •

edited

Loading

Improved doublet detection in call_lineages #225

Are you sure you want to change the base?

Improved doublet detection in call_lineages #225

Conversation

colganwi commented Sep 19, 2023 • edited Loading

Improved doublet detection in `call_lineages` #225

Improved doublet detection in `call_lineages` #225

colganwi commented Sep 19, 2023 •

edited

Loading