Support for multi-gene set ensemble #54

ejarmand · 2024-11-30T20:06:00Z

Description of feature

One of the primary drivers of sc analysis is often marker gene selection. I would likely expect this to have a larger impact than algorithm choice in most cases. Ideally sampling across the space of possible gene sets for integration would be very interesting and useful (I've seen multiple cases of clusters driven by a single gene).

For unsupervised methods in particular it should be pretty easy to implement.

Edited: many -> multiple, swapped words

canergen · 2024-12-01T00:08:31Z

Hi, I assume it’s easy to implement for all methods. It would just be a second loop. I won’t have the bandwidth to do it this month. Currently, subsetting genes outside and disabling hvg selection and running it separately would be my recommendation.
I’m a bit confused though. How high is the expression for this single gene? How many cells of that type would you expect to have zero observed expression given Poisson sampling? I guess it might be that this single gene is an actual marker gene but other differences in expression allow to cluster those cells distinctly. Does this make sense?

ejarmand · 2024-12-03T04:13:25Z

Hi Can, I totally understand your thoughts on the single gene clustering. This has come up a couple times with collaborators, and usually when sub-clustering a largely homogeneous cell type (think it can also be exacerbated by choices in dimensionality reduction, and have seen it enhanced by certain residual normalization procedures). Probably not a realistic example when applied reference mapping an entire dataset at once. Sometimes there are reasonable correlates (e.g. sequencing depth) and sometimes there aren't.

Regardless that was mostly meant as an unambiguous example of gene-selection effects rather than the primary use case.

Working primarily in brain tissues annotating subclusters is pretty common and seems to be even more sensitive to gene panel selection.

ejarmand added the enhancement New feature or request label Nov 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for multi-gene set ensemble #54

Support for multi-gene set ensemble #54

ejarmand commented Nov 30, 2024 •

edited

Loading

canergen commented Dec 1, 2024

ejarmand commented Dec 3, 2024

Support for multi-gene set ensemble #54

Support for multi-gene set ensemble #54

Comments

ejarmand commented Nov 30, 2024 • edited Loading

Description of feature

canergen commented Dec 1, 2024

ejarmand commented Dec 3, 2024

ejarmand commented Nov 30, 2024 •

edited

Loading