MSLibrarian is an R-package to optimize predicted spectral libraries from Prosit (https://www.proteomicsdb.org/prosit/) for usage in targeted analysis of a given set of DIA-MS data.\ Available optimization steps build on
- (i) Spectrum-centric conversion and database search of the DIA data via DIA-Umpire and MSFragger
- (ii) Comparison of identified spectra vs. spectra predicted using variable Collision Energy (CE) settings in Prosit
- (iii) Retention time prediction by DeepLC, with calibration on observed retention times on this LC setup
- (iv) Subsetting to most relevant protein set (at relaxed FDR to avoid exclusion of false negatives)
- (v) Reduction of library to top N most-intense fragment ions to improve filezise and downstream processing speeds Refined libraries are written out in openswath or spectronaut format, ready for use in downstream peptide-centric analysis tools such as DIA-NN 1.8.
In its current form, MSLibrarian must be installed on a computer with Windows as the operating system. This requirement is mainly a consequence of the current third party softwares that the package uses for its operation. A future aim is to make MSLibrarian into a cross-platform application, and also provide it as a docker image.
A recommendation is to use a computer with at least 32 GB RAM to avoid issues during some of the more memory-requiring tasks that MSLibrarian performs.
To run all features of MSLibrarian, the following softwares/pipelines must be installed on the C:/-drive
- R version 4.0.0 or later.
- Proteowizard suite version 3.0.20365 or later
- Trans-proteomic pipeline version 5.2.0 or later
- MSFragger version 3.2 or later
- OpenMS version 2.5.0 or later
- DeepLC GUI version 0.1.29 or later.Follow the installation guide to setup the miniconda environment. As an alternative, the DeepLC CLI (.exe) can be installed instead.
- DIA-NN version 1.8. Currently the most recent version, but older versions should also work.
The input MS data must conform to the following:
- Format: Thermo raw (other file formats should be available in the future)
- Acquisition mode: DIA (must contain both MS1 and MS2 scans)
To both download and install MSLibrarian from Github, use the devtools package.
library(devtools)
install_github("MarcIsak/MSLibrarian")
MSLibrarian relies on Spectral Warehouse SQLite databases to make predicted spectral libraries. SQLite databases can be downloaded from Zenodo for the following common species:
- Homo sapiens (Human)
- Mus musculus (Mouse)
- Saccharomyces cerevisiae (Baker´s yeast)
- Caenorhabditis elegans (Roundworm)
- Drosophila melanogaster (Fruit fly)
- Escherichia coli K12 (Bacterium)
Alternatively, these databases can be downloaded from within MSLibrarian by the use of the function get.spectral.db().
To manually create a Spectral Warehouse SQLite Database, go to the Wiki of this repository for full details.
Some of the third-party tools that MSLibrarian uses, such as Comet (or MSFragger), DIA-Umpire or Spectrast require parameters files. Example parameter files can be found in the folder params in this repository. Make sure to add the folder to the C:/-drive and that the path to the files do not contain any spaces.
These parameter files can be edited, but it is not recommended to do so.
Go to the Wiki of this repository to learn how to create a predicted spectral library in MSLibrarian.