-
Notifications
You must be signed in to change notification settings - Fork 0
License
johnhalloran321/datum
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
DATUM - DBN Analysis Toolkit to Understand Mass spectra =================================================== DATUM is a toolkit consisting of several dynamic Bayesian networks (DBNs) and associated software tools to analyze tandem mass (MS/MS) spectra. The toolkit supports effective parameter learning, database search, and post-search feature extraction for Percolator analysis. Written in Python and ported from the DRIP Toolkit (DTK), DATUM adds sveral new modules for Didea, a DBN whose posterior-based scoring function may be thought of as a probabilistic analogue to the popular XCorr scoring function found in SEQUEST/Comet/CRUX. Training and searching with Didea is both faster and more accurate than using DRIP, while also offering training convergence guarantees not possible with DRIP (i.e., Didea learning converges to a global optimum while DRIP learning only converged to a local optima). If you use the Didea features offered by DATUM in your research, please cite: John T. Halloran and David M. Rocke. "Learning Concave Conditional Likelihood Models for Improved Analysis of Tandem Mass Spectra." Advances in Neural Information Processing Systems (NIPS) 2018. --------------------------------------------------- ----------------- DATUM Installation --------------------------------------------------- If you are only using the Didea modules (i.e., digest.py, dideaTrain.py, dideaSearch.py), no extra software aside from Python 2.7 is required and the modules will work after downloading. For information on using DRIP, see below. --------------------------------------------------- ----------------- Note --------------------------------------------------- The Didea modules of DATUM are currently under rapid development and thus in their early stages. DATUM will be updated shortly as new Didea features are added (i.e., support for high-res MS2 search, parameter learning with peptide modifications, feature extraction, documentation, etc.). =================================================== ========== Info for DRIP Tools =================================================== The DRIP Toolkit utilizes a dynamic Bayesian network (DBN) for Rapid Identification of Peptides (DRIP) in tandem mass spectra. Given an observed spectrum, DRIP scores a peptide by aligning the peptide's theoretical spectrum and the observed spectrum, i.e., computing the most probable sequence of insertions (spurious observed peaks) and deletions (missing theoretical peaks). DBN inference is efficiently performed utilizing the Graphical Models Toolkit (GMTK), which allows easy alterations to the model. If you use the DRIP toolkit in your research, please cite: John T. Halloran, Jeff A. Bilmes, and William S. Noble. "Dynamic bayesian network for accurate detection of peptides from tandem mass spectra." Journal of proteome research 15.8 (2016): 2749-2759. Written by John T. Halloran ([email protected]). --------------------------------------------------- ----------------- Installation --------------------------------------------------- The toolkit requires the following be installed: Cygwin (if using Windows) g++ the Graphical Models Toolkit (GMTK) - https://melodi.ee.washington.edu/gmtk/ Python 2.7 argparse and numpy python packages SWIG After installing the above, perform the following in the unzipped toolkit directory: cd pyFiles/pfile swig -c++ -python libpfile.i CC=g++ python setup.py build_ext -i Assuming no errors were output, the DRIP Toolkit is now ready for use! To test that the above was compiled and linked correctly, run: ./test.py --------------------------------------------------- ----------------- Searching ms2 files --------------------------------------------------- For convenience, sample data is included in directory data and example DRIP Toolkit search commands are provided in test.sh. To search an MS2 dataset data/test.ms2 given FASTA file data/yeast.fasta, perform the following steps: 1.) Digest the FASTA file using dripDigest.py. Example: python dripDigest.py \ --digest-dir dripDigest-output \ --min-length 6 \ --fasta data/yeast.fasta \ --enzyme 'trypsin/p' \ --monoisotopic-precursor true \ --missed-cleavages 0 \ --digestion 'full-digest' The digested peptide database will be written to directory dripDigest-output 2.) Search using dripSearch.py. Example: python dripSearch.py \ --digest-dir 'dripDigest-output' \ --spectra data/test.ms2 \ --precursor-window 3.0 \ --learned-means dripLearned.means \ --learned-covars dripLearned.covars \ --num-threads 8 \ --top-match 1 \ --high-res-ms2 F \ --output dripSearch-test-output If the data was collected utilizing high-resolution fragment ions, set --high-res-ms2 T. The output PSMs will be written to file dripSearch-test-output.txt. For a detailed explanation of using the toolkit, including training DRIP, preparing data for cluster usage, and speeding up a search using approximate inference, please consult: https://jthalloran.bitbucket.io/dripToolkit/documentation.html For a full list of allowable toolkit options, please consult: https://jthalloran.bitbucket.io/dripToolkit/dripDigest.html https://jthalloran.bitbucket.io/dripToolkit/dripSearch.html https://jthalloran.bitbucket.io/dripToolkit/dripTrain.html --------------------------------------------------- ----------------- Contaact --------------------------------------------------- Please send all questions and bug reports to: [email protected]
About
No description, website, or topics provided.
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published