Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adjustRtime not conducted with large dataset, but working after register(SerialParam()) #766

Open
c-ze opened this issue Sep 10, 2024 · 6 comments

Comments

@c-ze
Copy link

c-ze commented Sep 10, 2024

With a large dataset of >500 injections (Orbitrap Q-Exactive, full scan, not centroided), I encountered that the retention time adjustment did not execute. The script kept running, so that later processing steps continued though no adjusted retention time was reported.

As workaround, the adjustRtime function worked fine when I ran beforehand register(SerialParam()), as it was proposed for a similar issue here: #358

For reference, the data used is mostly in the Metabolights-repository under MTBLS3450 and MTBLS8433 (for analysis, only the positive-mode acquisitions were used, while measurement was with alternating polarity). The latest extension of the dataset, with which the error occured, is not yet in the repository. For XCMS, data was converted to mzXML via Proteowizard.

The dataset comprises time-series (sampling) data from different groundwater probing sites ("Location" in metabolights), and one location (H14, smallest number of injections) executed in XCMS without problems, while 7 locations (with >500 injections) showed the error.

I observed the issue both on Windows10, and Ubuntu. The workaround was successfully tested on Ubuntu (only for now), with the following sessionInfo:

R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.1 LTS

Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0

locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

time zone: Europe/Berlin
tzcode source: system (glibc)

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] xcms_4.2.3 BiocParallel_1.38.0

loaded via a namespace (and not attached):
[1] tidyselect_1.2.1 dplyr_1.1.4
[3] lazyeval_0.2.2 MassSpecWavelet_1.70.0
[5] digest_0.6.37 XML_3.99-0.17
[7] lifecycle_1.0.4 cluster_2.1.6
[9] ProtGenerics_1.36.0 statmod_1.5.0
[11] magrittr_2.0.3 compiler_4.4.1
[13] progress_1.2.3 rlang_1.1.4
[15] tools_4.4.1 igraph_2.0.3
[17] utf8_1.2.4 prettyunits_1.2.0
[19] S4Arrays_1.4.1 DelayedArray_0.30.1
[21] plyr_1.8.9 RColorBrewer_1.1-3
[23] abind_1.4-5 purrr_1.0.2
[25] BiocGenerics_0.50.0 grid_4.4.1
[27] stats4_4.4.1 preprocessCore_1.66.0
[29] fansi_1.0.6 colorspace_2.1-1
[31] ggplot2_3.5.1 iterators_1.0.14
[33] scales_1.3.0 MASS_7.3-61
[35] MultiAssayExperiment_1.30.3 SummarizedExperiment_1.34.0
[37] cli_3.6.3 mzR_2.38.0
[39] crayon_1.5.3 generics_0.1.3
[41] httr_1.4.7 reshape2_1.4.4
[43] ncdf4_1.23 DBI_1.2.3
[45] affy_1.82.0 stringr_1.5.1
[47] zlibbioc_1.50.0 parallel_4.4.1
[49] impute_1.78.0 AnnotationFilter_1.28.0
[51] BiocManager_1.30.25 XVector_0.44.0
[53] vsn_3.72.0 matrixStats_1.3.0
[55] vctrs_0.6.5 Matrix_1.7-0
[57] jsonlite_1.8.8 hms_1.1.3
[59] IRanges_2.38.1 S4Vectors_0.42.1
[61] MsExperiment_1.6.0 MALDIquant_1.22.3
[63] clue_0.3-65 foreach_1.5.2
[65] limma_3.60.4 tidyr_1.3.1
[67] affyio_1.74.0 glue_1.7.0
[69] MSnbase_2.30.1 codetools_0.2-20
[71] QFeatures_1.14.2 Spectra_1.14.1
[73] stringi_1.8.4 gtable_0.3.5
[75] GenomeInfoDb_1.40.1 GenomicRanges_1.56.1
[77] UCSC.utils_1.0.0 mzID_1.42.0
[79] munsell_0.5.1 tibble_3.2.1
[81] pillar_1.9.0 MsFeatures_1.12.0
[83] pcaMethods_1.96.0 GenomeInfoDbData_1.2.12
[85] R6_2.5.1 doParallel_1.0.17
[87] lattice_0.22-5 Biobase_2.64.0
[89] MetaboCoreUtils_1.12.0 Rcpp_1.0.13
[91] PSMatch_1.8.0 SparseArray_1.4.8
[93] fs_1.6.4 MsCoreUtils_1.16.1
[95] MatrixGenerics_1.16.0 pkgconfig_2.0.3

@jorainer
Copy link
Collaborator

Thanks for reporting. Could you maybe also specify the method and parameters you used for the alignment? Also, did you use the newer objects that use/base on MsExperiment (how did you read the data, with readMsData() or using readMsExperiment())?

@c-ze
Copy link
Author

c-ze commented Oct 1, 2024

Hi @jorainer

Many thanks for your reply, and my apologies for my late response - as described below I succeeded with the SerialParam workaround, and now completed re-processing to look into the data.

I have the script running for a while, so likely not use the latest, but it has been working for several previous dataset expansions re-running the analysis (until the latest).

Here are the data preprocessing steps I used, data input is .mzXML (single polarity, positive mode, converted via ProteoWizard):

# file input as dataframe via: data.frame(pd, stringsAsFactors=FALSE)

# read data
raw_data <- readMSData(files=pd[,1], pdata=new("NAnnotatedDataFrame", pd), mode="onDisk")

# find chrom peaks
cwp <- CentWaveParam(ppm=25, snthresh=20, peakwidth=c(5,20), prefilter=c(3,50000), noise = 50000,  mzCenterFun = "wMean", integrate = 1, fitgauss = FALSE, verboseColumns = FALSE)
xdata <- findChromPeaks(raw_data, param=cwp, BPPARAM = SnowParam(detectCores()-1) )

# RT adjustment
xdata <- adjustRtime(xdata, param=ObiwarpParam(binSize=0.6))

And as the last did not go through, I instead on Ubuntu 24.04.1 LTS with R version 4.4.1 (2024-06-14) and xcms_4.2.3 ran the following, which worked. This workaround did not work on Windows10 though:

register(SerialParam())
xdata <- adjustRtime(xdata, param=ObiwarpParam(binSize=0.6))

From there, I succeeded to complete the processing, and can't see any further problem in the output.

@jorainer
Copy link
Collaborator

jorainer commented Oct 3, 2024

Just for curiosity - what are the dimensions of your chromPeaks() matrix? and could you eventually also run print(object.size(xdata), unit = "GB") on your object? I realized that in some instances, when there are a very large number of chrom peaks, processing can really slow down - if you get close to the memory limit of your system.

To fix that, we're working on a solution that uses also an on-disk representation of the chromPeaks matrix - with that, xcms should again be scalable also for very large scale data sets...

@c-ze
Copy link
Author

c-ze commented Oct 7, 2024

Thanks, sounds good!

Regarding your enquiries, I checked as outlined the dimensions of xdata of one of the datasets, at the stage before failing the retention time alignment (i.e. steps conducted were data reading and peak picking only):

  • I get reported 0.1 GB when running print(object.size(xdata), unit = "GB")
  • running with "MB" reports me 65.5 MB
  • The dimensions of chromPeaks(xdata) are [1:4149542, 1:11]

Many thanks!

@jorainer
Copy link
Collaborator

This is a bit puzzling. The data is actually quite small, so I can not really understand why parallel processing should not work. What type of parallel processing are you using? MulticoreParam? SnowParam? Are you starting/enabling the parallel processing first using e.g. bpstart(register(MulticoreParam(4))) or just calling register(MulticoreParam(4))?

@c-ze
Copy link
Author

c-ze commented Oct 22, 2024

Hi jorainer

I do not enable parallel processing beforehand in a separate command. So at the stage of retention time alignment, the XCMS processing functions are really just run as noted above (copied below for reference):

# file input as dataframe via: data.frame(pd, stringsAsFactors=FALSE)

# read data
raw_data <- readMSData(files=pd[,1], pdata=new("NAnnotatedDataFrame", pd), mode="onDisk")

# find chrom peaks
cwp <- CentWaveParam(ppm=25, snthresh=20, peakwidth=c(5,20), prefilter=c(3,50000), noise = 50000,  mzCenterFun = "wMean", integrate = 1, fitgauss = FALSE, verboseColumns = FALSE)
xdata <- findChromPeaks(raw_data, param=cwp, BPPARAM = SnowParam(detectCores()-1) )

# RT adjustment
xdata <- adjustRtime(xdata, param=ObiwarpParam(binSize=0.6))

And as noted, serialisation permitted to complete the retention time alignment, but only on an Ubuntu computer and not on my Windows10 computer

register(SerialParam())
xdata <- adjustRtime(xdata, param=ObiwarpParam(binSize=0.6))

Many thanks and best wishes

Christian

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants