Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for multi-gene set ensemble #54

Open
ejarmand opened this issue Nov 30, 2024 · 2 comments
Open

Support for multi-gene set ensemble #54

ejarmand opened this issue Nov 30, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@ejarmand
Copy link

ejarmand commented Nov 30, 2024

Description of feature

One of the primary drivers of sc analysis is often marker gene selection. I would likely expect this to have a larger impact than algorithm choice in most cases. Ideally sampling across the space of possible gene sets for integration would be very interesting and useful (I've seen multiple cases of clusters driven by a single gene).

For unsupervised methods in particular it should be pretty easy to implement.

Edited: many -> multiple, swapped words

@ejarmand ejarmand added the enhancement New feature or request label Nov 30, 2024
@canergen
Copy link
Member

canergen commented Dec 1, 2024

Hi, I assume it’s easy to implement for all methods. It would just be a second loop. I won’t have the bandwidth to do it this month. Currently, subsetting genes outside and disabling hvg selection and running it separately would be my recommendation.
I’m a bit confused though. How high is the expression for this single gene? How many cells of that type would you expect to have zero observed expression given Poisson sampling? I guess it might be that this single gene is an actual marker gene but other differences in expression allow to cluster those cells distinctly. Does this make sense?

@ejarmand
Copy link
Author

ejarmand commented Dec 3, 2024

Hi Can, I totally understand your thoughts on the single gene clustering. This has come up a couple times with collaborators, and usually when sub-clustering a largely homogeneous cell type (think it can also be exacerbated by choices in dimensionality reduction, and have seen it enhanced by certain residual normalization procedures). Probably not a realistic example when applied reference mapping an entire dataset at once. Sometimes there are reasonable correlates (e.g. sequencing depth) and sometimes there aren't.

Regardless that was mostly meant as an unambiguous example of gene-selection effects rather than the primary use case.

Working primarily in brain tissues annotating subclusters is pretty common and seems to be even more sensitive to gene panel selection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants