Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merging Nhood with logFC of opposite sign #270

Open
gianfilippo opened this issue Mar 3, 2023 · 19 comments
Open

merging Nhood with logFC of opposite sign #270

gianfilippo opened this issue Mar 3, 2023 · 19 comments
Assignees
Labels
bug Something isn't working

Comments

@gianfilippo
Copy link

gianfilippo commented Mar 3, 2023

Hi,

I am running the groupNhoods function on the da_results data.frame.
The da_results shows the following
logFC logCPM F PValue FDR Nhood SpatialFDR
946 6.555608 7.725746 20.33434 3.864789e-04 0.2744 946 0.2281628
1221 -2.679248 8.772024 26.29051 9.286647e-05 0.2744 1221 0.2281628
2178 -3.034338 8.446574 20.82294 3.465521e-04 0.2744 2178 0.2281628
2248 -2.549574 8.700925 20.51058 3.239712e-04 0.2744 2248 0.2281628
2291 -3.060969 8.536979 22.01882 2.303425e-04 0.2744 2291 0.2281628

da_results = groupNhoods(miloobj, da_results, da.fdr=FDR)
logFC logCPM F PValue FDR Nhood SpatialFDR. NhoodGroup
946 6.555608 7.725746 20.33434 3.864789e-04 0.2744 946 0.2281628. 9
1221 -2.679248 8.772024 26.29051 9.286647e-05 0.2744 1221 0.2281628. 14
2178 -3.034338 8.446574 20.82294 3.465521e-04 0.2744 2178 0.2281628. 9
2248 -2.549574 8.700925 20.51058 3.239712e-04 0.2744 2248 0.2281628. 1
2291 -3.060969 8.536979 22.01882 2.303425e-04 0.2744 2291 0.2281628. 1

It looks like the Nhoods 946 and 2178 are placed in teh same group, even though the logFC is of opposite sign.

Note: I am using a FDR cut-off of 0.25, because in this run I only had N=2

I am running miloR version 1.5.0

Am I missing something ?

Thanks

@MikeDMorgan
Copy link
Member

Hi @gianfilippo as stated on the github repo landing page please report the output of your sessionInfo() when reporting issues.

@gianfilippo
Copy link
Author

sorry about it. Here it is
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux Server 7.9 (Maipo)

Matrix products: default
BLAS/LAPACK: /gpfs/ycga/apps/hpc/software/OpenBLAS/0.3.12-GCC-10.2.0/lib/libopenblas_haswellp-r0.3.12.so

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats4 grid stats graphics grDevices utils datasets
[8] methods base

other attached packages:
[1] miloR_1.5.0 edgeR_3.40.1
[3] limma_3.54.0 HGNChelper_0.8.1
[5] ProjecTILs_3.0.1 DoubletFinder_2.0.3
[7] scDblFinder_1.13.7 djvdj_0.0.0.9000
[9] scater_1.24.0 scuttle_1.6.3
[11] SingleCellExperiment_1.20.0 SummarizedExperiment_1.26.1
[13] Biobase_2.56.0 GenomicRanges_1.48.0
[15] GenomeInfoDb_1.32.4 IRanges_2.30.1
[17] S4Vectors_0.34.0 BiocGenerics_0.42.0
[19] MatrixGenerics_1.10.0 matrixStats_0.62.0
[21] muscat_1.10.1 CIPR_0.1.0
[23] ComplexHeatmap_2.12.1 gprofiler2_0.2.1
[25] ggplot2_3.4.0 cowplot_1.1.1
[27] future_1.28.0 purrr_1.0.1
[29] openxlsx_4.2.5 patchwork_1.1.1
[31] miQC_1.4.0 SeuratWrappers_0.3.1
[33] SeuratDisk_0.0.0.9020 glmGamPoi_1.8.0
[35] SeuratObject_4.1.3 Seurat_4.3.0
[37] dplyr_1.1.0

loaded via a namespace (and not attached):
[1] rsvd_1.0.5 ica_1.0-2
[3] Rsamtools_2.12.0 foreach_1.5.2
[5] lmtest_0.9-40 crayon_1.5.1
[7] rbibutils_2.2.8 MASS_7.3-57
[9] nlme_3.1-157 backports_1.4.1
[11] rlang_1.0.6 XVector_0.36.0
[13] ROCR_1.0-11 irlba_2.3.5.1
[15] nloptr_2.0.0 xgboost_1.6.0.1
[17] BiocParallel_1.32.4 rjson_0.2.21
[19] bit64_4.0.5 glue_1.6.2
[21] sctransform_0.3.5 pbkrtest_0.5.1
[23] parallel_4.2.0 vipor_0.4.5
[25] spatstat.sparse_3.0-0 AnnotationDbi_1.60.0
[27] spatstat.geom_3.0-3 tidyselect_1.2.0
[29] fitdistrplus_1.1-8 variancePartition_1.26.0
[31] XML_3.99-0.9 tidyr_1.2.0
[33] zoo_1.8-10 GenomicAlignments_1.32.1
[35] xtable_1.8-4 magrittr_2.0.3
[37] Rdpack_2.3 cli_3.6.0
[39] zlibbioc_1.42.0 dbscan_1.1-11
[41] miniUI_0.1.1.1 sp_1.5-0
[43] aod_1.3.2 shiny_1.7.1
[45] BiocSingular_1.12.0 clue_0.3-60
[47] cluster_2.1.3 caTools_1.18.2
[49] tidygraph_1.2.1 KEGGREST_1.36.3
[51] tibble_3.1.8 ggrepel_0.9.1
[53] listenv_0.8.0 Biostrings_2.64.1
[55] png_0.1-7 withr_2.5.0
[57] bitops_1.0-7 ggforce_0.4.1
[59] plyr_1.8.7 pracma_2.4.2
[61] dqrng_0.3.0 coda_0.19-4
[63] pillar_1.8.1 gplots_3.1.3
[65] GlobalOptions_0.1.2 cachem_1.0.6
[67] multcomp_1.4-18 flexmix_2.3-17
[69] hdf5r_1.3.5 GetoptLong_1.0.5
[71] DelayedMatrixStats_1.18.0 vctrs_0.5.2
[73] ellipsis_0.3.2 generics_0.1.2
[75] tools_4.2.0 beeswarm_0.4.0
[77] munsell_0.5.0 tweenr_2.0.2
[79] emmeans_1.8.1-1 DelayedArray_0.22.0
[81] fastmap_1.1.0 compiler_4.2.0
[83] abind_1.4-5 httpuv_1.6.5
[85] rtracklayer_1.56.1 plotly_4.10.0
[87] GenomeInfoDbData_1.2.8 gridExtra_2.3
[89] glmmTMB_1.1.5 lattice_0.20-45
[91] deldir_1.0-6 utf8_1.2.2
[93] later_1.3.0 jsonlite_1.8.0
[95] scales_1.2.0 ScaledMatrix_1.4.0
[97] pbapply_1.5-0 sparseMatrixStats_1.8.0
[99] estimability_1.4.1 genefilter_1.78.0
[101] lazyeval_0.2.2 promises_1.2.0.1
[103] doParallel_1.0.17 R.utils_2.11.0
[105] goftest_1.2-3 spatstat.utils_3.0-1
[107] reticulate_1.26 sandwich_3.0-1
[109] blme_1.0-5 statmod_1.4.36
[111] Rtsne_0.16 uwot_0.1.14
[113] igraph_1.3.1 survival_3.3-1
[115] numDeriv_2016.8-1.1 yaml_2.3.5
[117] htmltools_0.5.2 memoise_2.0.1
[119] modeltools_0.2-23 BiocIO_1.6.0
[121] locfit_1.5-9.5 graphlayouts_0.8.0
[123] viridisLite_0.4.0 digest_0.6.29
[125] RhpcBLASctl_0.21-247.1 mime_0.12
[127] RSQLite_2.2.12 future.apply_1.9.0
[129] remotes_2.4.2 data.table_1.14.2
[131] blob_1.2.3 R.oo_1.24.0
[133] splines_4.2.0 RCurl_1.98-1.6
[135] broom_1.0.1 hms_1.1.1
[137] colorspace_2.0-3 BiocManager_1.30.19
[139] ggbeeswarm_0.6.0 shape_1.4.6
[141] nnet_7.3-17 Rcpp_1.0.8.3
[143] RANN_2.6.1 mvtnorm_1.1-3
[145] circlize_0.4.14 fansi_1.0.3
[147] tzdb_0.3.0 parallelly_1.32.1
[149] R6_2.5.1 ggridges_0.5.3
[151] lifecycle_1.0.3 zip_2.2.0
[153] bluster_1.6.0 abdiv_0.2.0
[155] minqa_1.2.4 leiden_0.4.3
[157] Matrix_1.5-0 RcppAnnoy_0.0.19
[159] TH.data_1.1-0 RColorBrewer_1.1-3
[161] iterators_1.0.14 spatstat.explore_3.0-5
[163] TMB_1.9.1 stringr_1.4.0
[165] htmlwidgets_1.5.4 beachmat_2.12.0
[167] polyclip_1.10-4 iNEXT_3.0.0
[169] globals_0.16.1 spatstat.random_3.0-1
[171] progressr_0.11.0 codetools_0.2-18
[173] metapod_1.4.0 gtools_3.9.2
[175] prettyunits_1.1.1 R.methodsS3_1.8.1
[177] gtable_0.3.0 DBI_1.1.2
[179] tensor_1.5 httr_1.4.2
[181] KernSmooth_2.23-20 stringi_1.7.6
[183] progress_1.2.2 farver_2.1.0
[185] reshape2_1.4.4 annotate_1.74.0
[187] viridis_0.6.2 boot_1.3-28
[189] BiocNeighbors_1.14.0 lme4_1.1-29
[191] restfulr_0.0.15 readr_2.1.2
[193] geneplotter_1.74.0 scattermore_0.8
[195] DESeq2_1.36.0 scran_1.24.1
[197] bit_4.0.4 spatstat.data_3.0-0
[199] ggraph_2.1.0 pkgconfig_2.0.3
[201] lmerTest_3.1-3

@MikeDMorgan
Copy link
Member

Please up date Milo the current Bioc release first.

@gianfilippo
Copy link
Author

Hi,
I updated miloR to 1.6.0. I still get Nhoods of opposite sign within the same NhoodGroup, if I run the following

da_results = groupNhoods(miloOBJ, da_results, da.fdr=0.05)

logFC PValue FDR Nhood SpatialFDR NhoodGroup
-3.672 3.096e-06 0.002750 26 0.0015060 7
-5.267 8.613e-07 0.001912 1379 0.0007996 7
2.937 5.126e-05 0.022768 1717 0.0155335 7
7.346 7.226e-07 0.001912 1933 0.0007996 7
-7.452 1.556e-06 0.001927 3038 0.0009698 7
-4.407 4.650e-06 0.003442 3039 0.0019969 7
4.313 5.801e-06 0.003680 3806 0.0023399 3
-3.418 7.929e-05 0.032013 3873 0.0209600 7
7.148 3.307e-05 0.016318 3992 0.0106319 3
3.089 1.418e-04 0.052510 4077 0.0354637 7
-4.147 1.195e-05 0.006634 4139 0.0042020 13
-4.472 1.735e-06 0.001927 4408 0.0009698 7

What do you think ?

Thanks

@MikeDMorgan
Copy link
Member

Have you previously run buildNhoodGraph on these data? If so then it will by default use this unless you also specify compute.new=TRUE. I have no idea of course because you haven't posted your code.

@gianfilippo
Copy link
Author

Hi,

I start from a Seurat integrated object, scdata.
DefaultAssay(scdata) <- "RNA"
sce <- as.SingleCellExperiment(DietSeurat(scdata, graphs = NNgname, assays = "RNA", dimreducs = c("pca","integrated.umap")))
miloOBJ <- Milo(sce)
reducedDim(miloOBJ,type="pca") = scdata[["pca"]]@cell.embeddings
reducedDim(miloOBJ,type="umap") = scdata[["integrated_umap"]]@cell.embeddings
miloOBJ@graph = scdata@graphs
graph(miloOBJ) = graph(buildFromAdjacency(miloOBJ@graph[[NNgname]],is.binary=T,k=20))
miloOBJ <- makeNhoods(miloOBJ, prop=0.1, k=20, d=30, refined=T, reduced_dims="pca")

Then
countCells
calcNhoodDistance with reduced.dim="pca"
miloOBJ <- buildNhoodGraph(miloOBJ)

@MikeDMorgan
Copy link
Member

Ok - either don't run buildNhoodGraph, because this is agnostic to any DA testing results, or run groupNhoods(..., compute.new=TRUE).

@gianfilippo
Copy link
Author

I will give a try to both options

Thanks

@gianfilippo
Copy link
Author

Hi,

I tried both suggestions. Here are the results:

  1. the groupNhoods(..., compute.new=TRUE) does not seem to change the results. Still getting nhoodgroups with nhoods of opposite sign.

  2. I restarted from scratch, i.e. from the Seurat object. Not running buildNhoodGraph results in a different number of DA nhoods, I get 17 instead of 12. Not sure what this means.
    Also, still getting nhoodgroups with nhoods of opposite sign, after running
    da_results = groupNhoods(miloOBJ, da_results, da.fdr=FDR)
    I can attach the results if you like.

What do you think ? What am I doing wrong ?

@MikeDMorgan MikeDMorgan added the bug Something isn't working label Mar 8, 2023
@MikeDMorgan MikeDMorgan self-assigned this Mar 8, 2023
@gianfilippo
Copy link
Author

Hi,

did you have time to look into this issue ?

Thanks

@gianfilippo
Copy link
Author

I seems that if I define subset.nhoods (= to the DA Nhoods in my case), the logFC are concordant within each group, although the logFC difference can be larger than the max.lfc.delta.

@pranithavangala
Copy link

Any update on this @MikeDMorgan

@gianfilippo
Copy link
Author

my present solution is to use a combination of cell cluster and logFC sign to define groups.
I may also consider adding grouping by logFC value (range), but have not done that yet.

I am not sure about "mixed" nhoods, should I include them in marker analysis or filter them out ?

@liezeltamon
Copy link

hoping for an update on this, thank you!

@DarioS
Copy link

DarioS commented Apr 10, 2024

Isn't it expected and not an issue? The same kind of grouping is visible in the publication e.g. Figure 5d.

image

@amarinderthind
Copy link

I notice that this problematic phenomenon is still happening in version 2.2.0.

@MikeDMorgan
Copy link
Member

Do you have a minimally reproducible example so I can investigate please.

@DarioS
Copy link

DarioS commented Nov 20, 2024

Step 1: Download GSE188737_hnscc_seu.RData.gz from GSE188737. Unzip it. Then, adapt the loading path to your directory.

library(Seurat)
library(SingleCellExperiment)
load("GSE188737_hnscc_seu.RData") # Adapt to your location.

Then, simply run all commands below without modification. Do you see a mixed NhoodGroup 2 (red and blue)?

GSE188737counts <- GetAssayData(object = hnscc_seu, assay = "RNA", layer = "counts")
smoker <- rep("Yes", ncol(GSE188737counts))
smoker[hnscc_seu[[]]$patientID %in% c("HN263", "HN237", "HN272")] <- "No"
smoker <- factor(smoker)
GSE188737 <- SingleCellExperiment(assays = list(counts = GSE188737counts),
                                  colData = DataFrame(patientID = hnscc_seu[[]]$patientID,
                                                      sampleID = hnscc_seu[[]]$origin,
                                                      cellType = hnscc_seu[[]]$cell_type,
                                                      smoker = smoker))
library(scran)
GSE188737 <- logNormCounts(GSE188737)

library(scater)
GSE188737 <- runPCA(GSE188737)
GSE188737 <- runUMAP(GSE188737, dimred = "PCA")
plotUMAP(GSE188737, colour_by = "cellType")
fibroblastsOnly <- GSE188737[, GSE188737$cellType %in% c("CAFs", "Myofib")]

set.seed(4567)
library(miloR)
milo <- Milo(fibroblastsOnly)
reducedDim(milo, "UMAP") <- reducedDim(fibroblastsOnly, "UMAP")
milo <- buildGraph(milo)
milo <- makeNhoods(milo, refinement_scheme = "graph")

design <- data.frame(colData(milo))[, c("sampleID", "smoker")]
IDs <- unique(design$sampleID)
design <- design[!duplicated(design), "smoker", drop = FALSE]
rownames(design) <- IDs

milo <- countCells(milo, meta.data = colData(milo), samples = "sampleID")
## Reorder rownames to match columns of nhoodCounts(milo).
design <- design[colnames(nhoodCounts(milo)), , drop = FALSE]
smokerResults <- testNhoods(milo, design = ~ smoker, design.df = design, fdr.weighting = "graph-overlap")
smokerResults <- groupNhoods(milo, smokerResults, da.fdr = 0.05)
plotDAbeeswarm(smokerResults, "NhoodGroup")

image

@MikeDMorgan
Copy link
Member

One thing to note is that the default setting for plotDAbeeswarm is alpha=0.1, whereas you are setting DA nhoods at FDR < 0.05. The nhood grouping only considers the DA nhoods at your defined spatial FDR threshold, hence it works as expected.

@DarioS You should also note that in this dataset there is almost perfect separation of more than half of the nhoods between the smokers and non-smokers, hence it is not appropriate to use as-is for Milo. You can see this by using the checkSeparation() function:

sum(checkSeparation(milo, design, condition="smoker"))
[1] 598

Milo assumes the cells are mixed across samples - without knowing the experimental design, this could be down to strong confounding between batch and the variable of interest.

Now to the apparent bug - this is a quirk of how the nhoods adjacency is treated for discordant sign nhoods. The implementation only considers DA nhoods as noted above, and so can yield groups that have discordant signs because there are DA nhoods connected to non-DA nhoods with different LFC signs. The intuition is that no nhood groups should contain different LFC signs. The difference is subtle but important.

The short story is - this is not a bug per se, but a subtlety in how the nhood adjacency matrix is modified. The adjustments won't propagate through to Bioconductor until the next release in April. However, it is possible to re-install Milo from the devel branch: devtools::install_github("MarioniLab/miloR", ref="devel").

The default behaviour of groupNhoods is retained. However, to get the intuitive, i.e. no nhood groups with discordant LFCs then use groupNhoods(..., original.behaviour=FALSE). As an example below:

# original behaviour
smokerResults <- groupNhoods(milo, smokerResults, da.fdr = 0.1)
plotDAbeeswarm(smokerResults, "NhoodGroup", alpha=0.1)

image

# intuitive behaviour
smokerResults <- groupNhoods(milo, smokerResults, da.fdr = 0.1, original.behaviour=FALSE)
plotDAbeeswarm(smokerResults, "NhoodGroup", alpha=0.1)

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants