Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kind questions about BayesPrism, THX! #104

Closed
NPTL1201 opened this issue Nov 3, 2024 · 1 comment
Closed

Kind questions about BayesPrism, THX! #104

NPTL1201 opened this issue Nov 3, 2024 · 1 comment

Comments

@NPTL1201
Copy link

NPTL1201 commented Nov 3, 2024

Dear Dr. Tinyi,

Thank you very much for developing such a valuable tool. I have a few questions that I would be grateful if you could help clarify.

  1. I came across some of your previous comments noting that missing cell types or using only non-tumor immune cell types as a reference can lead to inflation issues. For instance, in Question about calculate the immune cell fraction in bulk sequencing of tumor sample #95, you mentioned: "If only using non-tumor cells, BayesPrism will distribute reads from tumor cells to the cell type most similar to tumor cells in the reference, thereby inflating its fraction." If I understand correctly, non-immune cells, like fibroblasts that may share high transcriptional similarity with tumor cells, could inadvertently receive excess reads, potentially overestimating both their proportion and gene expression levels. Is my understanding accurate?

    Additionally, I would like your advice on another example. Suppose I already have a large, scRNA-seq dataset containing only T_CD8 cells. I would like to set the cell type as T_CD8 and define cell states as subsets, such as T_CD8_exhausted, T_CD8_naive, etc. Would this dataset be unsuitable for running BayesPrism? My concern regarding inflation is that reads meant for other missing cell types could end up assigned to T_CD8. For example, within the T_CD4 subset, follicular helper T cells (Tfh) frequently upregulate immune checkpoints like PDCD1 and CTLA4, similar to T_CD8_Tex. If BayesPrism were to use only the T_CD8 scRNA-seq as reference, would the reads associated with Tfh be misassigned to T_CD8_Tex in bulk samples? This misassignment would make it challenging to focus specifically on T_CD8_Tex in bulk samples?

  2. To reduce runtime, would you recommend downsampling the scRNA-seq reference object? For example, if I have 100,000 cells with comprehensive cell type/state representation, could I downsample within each cell type/state to build the reference? I noticed an error when running get.exp.stat() with a large number of cells: "Error: subscript contains invalid names. In addition: Warning message: In asMethod(object) : sparse->dense coercion: allocating vector of size 6.1 GiB."

  3. When running new.prism(), I encountered a warning message: "Warning: pseudo.min does not match min(phi)." Could you explain the meaning of this warning?

  4. Could you advise on the appropriate use of the cv value? Should it be used for filtering out samples with excessively high cv?

Thank you for your time and insights.
Napert

@tinyi
Copy link
Collaborator

tinyi commented Nov 17, 2024

Hi Napart,

Thank you for your questions. Apologize for the delay, as I have been traveling recently.

Here are the answers to your questions.

If I understand correctly, non-immune cells, like fibroblasts that may share high transcriptional similarity with tumor cells, could inadvertently receive excess reads, potentially overestimating both their proportion and gene expression levels. Is my understanding accurate?

Assuming that you have no or imcomplete representation of the tumor cells, yes, reads from the tumor cells in the bulk sample will inflate the cell type with similar transcription profile to the tumor cells. One way to mitigate this is to subset on marker genes to increase the signal-to-noise ratio of the representation of the scRNA-seq reference.

I would like to set the cell type as T_CD8 and define cell states as subsets, such as T_CD8_exhausted, T_CD8_naive, etc. Would this dataset be unsuitable for running BayesPrism?

You can certainly define the finer level CD8 subtypes as cell types. Just keep in mind that the finer resolution, the less confident the deconvolution is - this will be reflected in the theta.cv .

For example, within the T_CD4 subset, follicular helper T cells (Tfh) frequently upregulate immune checkpoints like PDCD1 and CTLA4, similar to T_CD8_Tex. If BayesPrism were to use only the T_CD8 scRNA-seq as reference, would the reads associated with Tfh be misassigned to T_CD8_Tex in bulk samples?

**If you have Tfh subtype similar to T_CD8_Tex, and there is no corresponding Tfh subtype in the reference, yes, you would have some reads from the Tfh subtype inflate that of your fraction of T_CD8_Tex. So again, the best way to address this is that subset on marker genes to minimize such "missing cell type" issues **

To reduce runtime, would you recommend downsampling the scRNA-seq reference object? For example, if I have 100,000 cells with comprehensive cell type/state representation, could I downsample within each cell type/state to build the reference? I noticed an error when running get.exp.stat() with a large number of cells: "Error: subscript contains invalid names. In addition: Warning message: In asMethod(object) : sparse->dense coercion: allocating vector of size 6.1 GiB."

Downsampling would lose some information. Alternatively, you can use metacell to aggregate the cells are statistically similar.

When running new.prism(), I encountered a warning message: "Warning: pseudo.min does not match min(phi)." Could you explain the meaning of this warning?

This shouldn't be a problem. This happens when you have a really deeply sequenced scRNA-seq reference, e.g. when you have tons of cells, so that one or more cell types have all genes with non-zero expression, and hence pseudo.min is not applied to these cell types.

Could you advise on the appropriate use of the cv value? Should it be used for filtering out samples with excessively high cv?

Typically cells of low cell type fraction, samples of low sequencing depth, and cell type of similar profile, will have high unconfidence (because there is little information in the bulk to support the inference). One application of theta.cv is when computing spearman's correlation, one can clip theta below certain threshold at zero (say 0.001 or lower), with threshold determined by theta.cv (you can plot the theta vs. theta.cv to determine a empirical cutoff).

Hope this helps.

Best,

Tinyi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants