-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
semi-supervised fastMNN correction #49
Comments
I'm trying to remember, but a long time ago, we might have had similar thoughts. It would be theoretically easy to implement; just restrict the MNN pair formation to cell populations with the same annotation across batches and proceed with the rest of the algorithm. I could see how this could improve correction performance by avoiding the formation of MNN pairs between the wrong populations. In practice, this was less useful than it seemed. People don't usually come into the analysis with existing annotations for the individual batches, at least not for their own experimental data. After all, the whole point of the batch correction step is to get everything on the same coordinate system so that you only have to do clustering and annotation once; if we already had consistent labels for each batch, we would never need to compute corrected values for the rest of our analysis. Other than to generate artworks like UMAP/t-SNE, perhaps, but I don't think those have much scientific value. I guess that this functionality might have some appeal for secondary analyses of published datasets that have already been annotated. However, this leads to another problem, which is the harmonization of labels across datasets from different authors. Some poor soul has to go through each combination of datasets and decide which labels match up between them; easy enough for the major cell types, but difficult for the more ambiguous subtypes that might have differing terminology/definitions across the community. Making a mistake here would encourage the formation of the wrong MNN pairs - and frankly, if you already know which cell types match up between datasets, you can probably just proceed with the rest of your meta-analysis without computing corrected values (artistic endeavors aside). In the end, I must have decided against putting in this functionality. Nonetheless, batchelor still contains a vestige of this line of thought, in the form of the |
Yes I think you have fair points, thanks for your input! |
Julien could also annotate cells prior to batch effect correction by using SingleR's or scClassify's correlation approach. |
(First, thanks Aaron for the development and maintenance of this awesome package!)
After reading this preprint, I was wondering if there would be the possibility for such a semi-supervised correction with
fastMNN()
?For example filtering MNN pairs could be done based on the prior annotation of different batches, based on the labels inferred from a SingleR run, based on the matching clusters after a clusterMNN() run... What do you think?
The text was updated successfully, but these errors were encountered: