-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kind questions about BayesPrism, THX! #104
Comments
Hi Napart, Thank you for your questions. Apologize for the delay, as I have been traveling recently. Here are the answers to your questions.
Assuming that you have no or imcomplete representation of the tumor cells, yes, reads from the tumor cells in the bulk sample will inflate the cell type with similar transcription profile to the tumor cells. One way to mitigate this is to subset on marker genes to increase the signal-to-noise ratio of the representation of the scRNA-seq reference.
You can certainly define the finer level CD8 subtypes as cell types. Just keep in mind that the finer resolution, the less confident the deconvolution is - this will be reflected in the theta.cv .
**If you have Tfh subtype similar to T_CD8_Tex, and there is no corresponding Tfh subtype in the reference, yes, you would have some reads from the Tfh subtype inflate that of your fraction of T_CD8_Tex. So again, the best way to address this is that subset on marker genes to minimize such "missing cell type" issues **
Downsampling would lose some information. Alternatively, you can use metacell to aggregate the cells are statistically similar.
This shouldn't be a problem. This happens when you have a really deeply sequenced scRNA-seq reference, e.g. when you have tons of cells, so that one or more cell types have all genes with non-zero expression, and hence pseudo.min is not applied to these cell types.
Typically cells of low cell type fraction, samples of low sequencing depth, and cell type of similar profile, will have high unconfidence (because there is little information in the bulk to support the inference). One application of theta.cv is when computing spearman's correlation, one can clip theta below certain threshold at zero (say 0.001 or lower), with threshold determined by theta.cv (you can plot the theta vs. theta.cv to determine a empirical cutoff). Hope this helps. Best, Tinyi |
Dear Dr. Tinyi,
Thank you very much for developing such a valuable tool. I have a few questions that I would be grateful if you could help clarify.
I came across some of your previous comments noting that missing cell types or using only non-tumor immune cell types as a reference can lead to inflation issues. For instance, in Question about calculate the immune cell fraction in bulk sequencing of tumor sample #95, you mentioned: "If only using non-tumor cells, BayesPrism will distribute reads from tumor cells to the cell type most similar to tumor cells in the reference, thereby inflating its fraction." If I understand correctly, non-immune cells, like fibroblasts that may share high transcriptional similarity with tumor cells, could inadvertently receive excess reads, potentially overestimating both their proportion and gene expression levels. Is my understanding accurate?
Additionally, I would like your advice on another example. Suppose I already have a large, scRNA-seq dataset containing only T_CD8 cells. I would like to set the cell type as T_CD8 and define cell states as subsets, such as T_CD8_exhausted, T_CD8_naive, etc. Would this dataset be unsuitable for running BayesPrism? My concern regarding inflation is that reads meant for other missing cell types could end up assigned to T_CD8. For example, within the T_CD4 subset, follicular helper T cells (Tfh) frequently upregulate immune checkpoints like PDCD1 and CTLA4, similar to T_CD8_Tex. If BayesPrism were to use only the T_CD8 scRNA-seq as reference, would the reads associated with Tfh be misassigned to T_CD8_Tex in bulk samples? This misassignment would make it challenging to focus specifically on T_CD8_Tex in bulk samples?
To reduce runtime, would you recommend downsampling the scRNA-seq reference object? For example, if I have 100,000 cells with comprehensive cell type/state representation, could I downsample within each cell type/state to build the reference? I noticed an error when running get.exp.stat() with a large number of cells: "Error: subscript contains invalid names. In addition: Warning message: In asMethod(object) : sparse->dense coercion: allocating vector of size 6.1 GiB."
When running new.prism(), I encountered a warning message: "Warning: pseudo.min does not match min(phi)." Could you explain the meaning of this warning?
Could you advise on the appropriate use of the cv value? Should it be used for filtering out samples with excessively high cv?
Thank you for your time and insights.
Napert
The text was updated successfully, but these errors were encountered: