-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add new Ion-mobility peak picking algorithm #647
Draft
RogerGinBer
wants to merge
8
commits into
sneumann:devel
Choose a base branch
from
RogerGinBer:spectra-ion-mobility
base: devel
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
5147dc7
Add new Ion-mobility peak picking algorithm
RogerGinBer 92eeec1
Change CentWaveParamIM to IMCentWaveParam
RogerGinBer 857f681
Refactor .do_findChromPeaks_IM_centWave
RogerGinBer 216210f
feat: Add IM peak-picking dispatch point by chunks
RogerGinBer 56cc96c
feat: extend peak Mz range to compensate for combineSpectra
RogerGinBer ce8daa4
feat: add getters and setters for IMCentwaveParam class
RogerGinBer f8f4826
test: add unit tests and sample IM data
RogerGinBer 120627a
refactor: simplify split_mobilogram_CWT
RogerGinBer File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2220,8 +2220,218 @@ do_findPeaks_MSW <- function(mz, int, snthresh = 3, | |
peaklist | ||
} | ||
|
||
############################################################ | ||
## Ion-mobility peak picking | ||
## | ||
#' @title Core API for Centwave-based ion-mobility peak picking | ||
#' @name do_findChromPeaks_IM_centWave | ||
#' | ||
#' @description Performs an extension of CentWave peak-picking on LC-IM-MS MS1 | ||
#' data. First it joins all scans into frames and performs .centWave_orig on | ||
#' the summarized LC-MS-like data. From each peak, it calculates its mobilogram and | ||
#' performs a second peak-picking on the IM dimension, resolving the peaks. | ||
#' | ||
#' @inheritParams do_findChromPeaks_centWave | ||
#' @inheritParams findChromPeaks-centWaveIonMobility | ||
#' | ||
#' @return A matrix, each row representing an identified peak, with columns: | ||
#' \describe{ | ||
#' \item{mz}{m/z value of the peak at the apex position.} | ||
#' \item{mzmin}{Minimum m/z of the peak.} | ||
#' \item{mzmax}{Maximum m/z of the peak.} | ||
#' \item{rt}{Retention time value of the peak at the apex position.} | ||
#' \item{rtmin}{Minimum retention time of the peak.} | ||
#' \item{rtmax}{Maximum retention time of the peak.} | ||
#' \item{im}{Ion mobility value of the peak at the apex position.} | ||
#' \item{immin}{Minimum ion mobility value of the peak.} | ||
#' \item{immax}{Maximum ion mobility value of the peak.} | ||
#' \item{maxo}{Maximum intensity of the peak.} | ||
#' \item{into}{Integrated (original) intensity of the peak.} | ||
#' \item{intb}{Always \code{NA}.} | ||
#' \item{sn}{Always \code{NA}} | ||
#' } | ||
#' | ||
#' @family core peak detection functions | ||
#' | ||
#' @author Roger Gine, Johannes Rainer | ||
#' | ||
#' @importFrom Spectra peaksData rtime combineSpectra mz | ||
do_findChromPeaks_IM_centWave <- function(spec, | ||
ppm = 25, | ||
peakwidth = c(20, 50), | ||
snthresh = 10, | ||
prefilter = c(3, 100), | ||
mzCenterFun = "wMean", | ||
integrate = 1, | ||
mzdiff = -0.001, | ||
fitgauss = FALSE, | ||
noise = 0, | ||
verboseColumns = FALSE, | ||
roiList = list(), | ||
firstBaselineCheck = TRUE, | ||
roiScales = numeric(), | ||
sleep = 0, | ||
extendLengthMSW = FALSE, | ||
ppmMerging = 10, | ||
binWidthIM = 0.01 | ||
){ | ||
|
||
## Merging all scans from the same frame to summarize across IM dimension | ||
scans_summarized <- | ||
Spectra::combineSpectra( | ||
spec, | ||
f = as.factor(spec$frameId), | ||
intensityFun = base::sum, | ||
weighted = TRUE, | ||
ppm = ppmMerging | ||
) | ||
Spectra::centroided(scans_summarized) <- TRUE | ||
|
||
## 1D Peak-picking on summarized data | ||
peaks <- .mse_find_chrom_peaks_sample(scans_summarized, | ||
msLevel = 1L, | ||
param = CentWaveParam(ppm = ppm, peakwidth = peakwidth, | ||
snthresh = snthresh, prefilter = prefilter, | ||
mzCenterFun = mzCenterFun, integrate = integrate, | ||
mzdiff = mzdiff, fitgauss = fitgauss, noise = noise, | ||
verboseColumns = verboseColumns, roiList = roiList, | ||
firstBaselineCheck = firstBaselineCheck, | ||
roiScales = roiScales, | ||
extendLengthMSW = extendLengthMSW)) | ||
if (!nrow(peaks)) return() | ||
|
||
#Correcting for the fact that combineSpectra combined close mz values | ||
peaks[,"mzmin"] <- peaks[,"mzmin"] * (1 - ppmMerging * 1e-6) | ||
peaks[,"mzmax"] <- peaks[,"mzmax"] * (1 + ppmMerging * 1e-6) | ||
|
||
## 1D Peak-picking, for each individual peak, to resolve across the IM dimension | ||
.do_resolve_IM_peaks_CWT(spec, peaks, binWidthIM) | ||
|
||
} | ||
|
||
|
||
.do_resolve_IM_peaks_CWT <- function(spec, peaks, binWidthIM){ | ||
## Extract frame information | ||
pdata <- peaksData(spec, columns = c("mz", "intensity")) | ||
rt <- rtime(spec) | ||
im <- spec$inv_ion_mobility | ||
|
||
## Resolving peaks across IM dimension | ||
resolved_peaks <- vector("list", nrow(peaks)) | ||
for (i in seq_len(nrow(peaks))) { | ||
current_peak <- peaks[i,] | ||
mobilogram <- .extract_mobilogram(pdata, current_peak, rt, im, binWidthIM) | ||
if (length(mobilogram) == 0) { | ||
# warning(i, " mobilogram is empty") | ||
next | ||
} | ||
|
||
bounds <- .split_mobilogram(mobilogram) | ||
new_peaks <- data.frame( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think using |
||
mz = current_peak["mz"], | ||
mzmin = current_peak["mzmin"], | ||
mzmax = current_peak["mzmax"], | ||
rt = current_peak["rt"], | ||
rtmin = current_peak["rtmin"], | ||
rtmax = current_peak["rtmax"], | ||
im = vapply(bounds, function(x) x[[2]], numeric(1)), | ||
immin = vapply(bounds, function(x) x[[1]], numeric(1)), | ||
immax = vapply(bounds, function(x) x[[3]], numeric(1)), | ||
row.names = NULL | ||
) | ||
resolved_peaks[[i]] <- new_peaks | ||
} | ||
|
||
resolved_peaks <- do.call(rbind, resolved_peaks) | ||
if(is.null(resolved_peaks) || !nrow(resolved_peaks)) return() | ||
|
||
## Refine and calculate peak parameters | ||
vals <- vector("list", nrow(resolved_peaks)) | ||
for (i in seq(nrow(resolved_peaks))) { | ||
peak <- unlist(resolved_peaks[i, , drop = TRUE]) | ||
|
||
## Create a EIC for mz, rt and IM ranges | ||
eic <- .extract_EIC_IM(peak, pdata, rt, im) | ||
|
||
if (nrow(eic) == 0 | all(eic[, 2] == 0)) | ||
next | ||
|
||
## Refine RT bounds | ||
rts <- c(peak["rtmin"], peak["rtmax"]) | ||
apx <- which.max(eic[, 2]) | ||
apx_rt <- eic[apx, 1] | ||
range <- xcms:::descendMin(eic[, 2], apx) | ||
|
||
eic <- eic[range[1]:range[2], , drop = FALSE] | ||
|
||
## Calculate peak stats | ||
vals[[i]] <- data.frame( | ||
mz = peak["mz"], | ||
mzmin = peak["mzmin"], | ||
mzmax = peak["mzmax"], | ||
rt = apx_rt, | ||
rtmin = min(eic[, 1]), | ||
rtmax = max(eic[, 1]), | ||
im = peak["im"], | ||
immin = peak["immin"], | ||
immax = peak["immax"], | ||
maxo = max(eic[, 2]), | ||
into = sum(eic[, 2]), | ||
intb = NA, | ||
sn = NA | ||
) | ||
} | ||
resolved_peaks <- do.call(rbind, vals) | ||
resolved_peaks <- | ||
resolved_peaks[resolved_peaks$into > 0, ] #Remove empty peaks | ||
|
||
as.matrix(resolved_peaks) | ||
} | ||
|
||
#' @importFrom MsCoreUtils between bin | ||
.extract_mobilogram <- function(pdata, peak, rt, im, binWidthIM = 0.01){ | ||
rtr <- c(peak[["rtmin"]], peak[["rtmax"]]) | ||
mzr <- c(peak[["mzmin"]], peak[["mzmax"]]) | ||
keep <- MsCoreUtils::between(rt, rtr) | ||
if (length(keep) == 0) return() | ||
ims <- im[keep] | ||
ints <- vapply(pdata[keep], xcms:::.aggregate_intensities, | ||
mzr = mzr, INTFUN = sum, na.rm = TRUE, numeric(1)) | ||
if(all(ints == 0)) return() | ||
mob <- MsCoreUtils::bin(x = ints[order(ims)], y = sort(ims), | ||
size = binWidthIM, FUN = sum) | ||
mob | ||
} | ||
|
||
|
||
.split_mobilogram <- function(mob){ | ||
if(length(mob$x) == 0){return()} | ||
vec <- mob$x | ||
apex <- which(MsCoreUtils::localMaxima(vec, hws = 4)) | ||
limits <- list() | ||
for (i in seq_along(apex)){ | ||
ranges <- descendMinTol(vec, startpos = c(apex[i], apex[i]), maxDescOutlier = 2) | ||
limits[[i]] <- mob$mids[c(ranges[1], apex[i], ranges[2])] | ||
} | ||
limits <- limits[vapply(limits, function(x){!any(is.na(x))}, logical(1))] | ||
limits | ||
} | ||
|
||
|
||
#' @importFrom dplyr between | ||
.extract_EIC_IM <- function(peak, pdata, rt, im){ | ||
rtr <- c(peak["rtmin"], peak["rtmax"]) | ||
mzr <- c(peak["mzmin"], peak["mzmax"]) | ||
imr <- c(peak["immin"], peak["immax"]) | ||
|
||
keep <- dplyr::between(rt, rtr[1], rtr[2]) & dplyr::between(im, imr[1], imr[2]) | ||
rts <- rt[keep] | ||
ints <- vapply(pdata[keep], xcms:::.aggregate_intensities, | ||
mzr = mzr, INTFUN = sum, na.rm = TRUE, numeric(1)) | ||
ints <- vapply(unique(rts), function(x){sum(ints[rts == x])}, numeric(1)) | ||
|
||
cbind(unique(rts), ints) | ||
} | ||
|
||
|
||
############################################################ | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there other ways/algorithms to perform peak detection on IM data that do not first collapse the data and then expand it again like you do here?
If not I would suggest to split the functionality into 3 functions:
peaksData
by framepeaksData
and the detected chrom peaks matrix from 2) as input and post processes the data.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle, yes, there are other ways to perform peak-picking that directly use the "full" data without collapsing (for instance, you could do a 2D-CWT where you change the scales in both RT and IM dimensions and find local maxima; or any peakpicking algorithm such as those used for GCxGC-MS, where you have a similar situation). I haven't looked deeply into such methods, but we should accomodate for them too, just in case
Still, splitting the functionality in
do_findChromPeaks_IM_centWave
seems good, since those functions would be reusable for other algorithms, etc. Specifically, if you agree, I'll do the following:.mse_find_chrom_peaks_sample
with the IM-collapsedSpectra
object and the paramas(param, CentWaveParam")
(since itIMCentWaveParam
inherits from it) -> That function will take care of extracting the peaksData, rt, valsPerSpect, etc., rundo_findChromPeaks_centWave
, and return the peak matrix.Sounds good? 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for my late reply!
splitting functionality is always good, as you say, enables reuse - and makes the code easier to read. so, yes, it sounds good.
so, if I understand:
param
inheritsIMCentWaveParam
collapse thepeaksData
by frame.do_findChromPeaks_centWave
for peak detection (sinceIMCentWaveParam
inheritsCentWaveParam
)param
inheritsIMCentWaveParam
post process the detected peaks (resolve across IM dimension) and return results.if the functions get to large, you could also consider implementing a
.im_mse_find_chrom_peaks_sample
that is called instead of.mse_find_chrom_peaks_sample
ifparam
inherits from anIM
param object... not sure if that would simplify integration of additional/other IM peak detection methods.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now I've encapsulated all this into
do_findChromPeaks_IM_centWave
, so it's called after dispatching the function corresponding to theparam
:It's basically the all steps you mentioned (collapse, CentWave and resolve), but called from a "lower" function call level, so everything upstream is more tidy. What I like about this is that all Centwave-specific checks and data extraction (mz, int, valsPerSpect, etc.) are handled by
.mse_find_chrom_peaks_sample
, so we are reusing already-existing code.I'll commit the proposed refactor so you can take a closer look by yourself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent. Let me know when you're OK/ready from your side. I would then like to try the code in action and tinker myself a bit to see if/where we could improve/optimize.
For that I would create yet another branch to play with the code and ask for your feedback on the merge.
Related to that: could you please provide a short code snipped with the example how to perform the analysis (I guess I got already a file from you on which I can test...)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely! If you agree, perhaps we should use (for testing) the
MsBackendMemory
orMsBackendDataFrame
to create a manageable toy example from the current read-onlyMsBackendTimsTof
(for instance, subsetting the RT)I'm still figuring out how to properly centroid the data (the format has some problems that makes the current
peakPicks
function inSpectra
ineffective, see below), but still, the analysis would go somewhat like this:You can use the TIMS-TOF data file I sent you a while back, just bear in mind it's in a zero-less profile mode (working on fixing that) and the chromatographic peaks are usually very short (<6-10s)