Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rule-based fragmentation #17

Open
wants to merge 126 commits into
base: dev
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
126 commits
Select commit Hold shift + click to select a range
def547e
Compatibility with conda version of geng; remove geng tool from package
jackgisby May 30, 2020
9988906
Incorporate pkl files into connectivity database
jackgisby May 30, 2020
21025ff
Add nauty as dependency
jackgisby May 30, 2020
b79ecf1
Add pickle as test dependency
jackgisby May 30, 2020
f180962
Switch from strings to pickles for connectivity graphs
jackgisby May 30, 2020
00c6f8f
Use blob instead of text to store pickled dictionary
jackgisby May 31, 2020
bad61eb
No longer write substructures to .smi
jackgisby Jun 10, 2020
24ff1e3
Add option to build to select only frequent substructures
jackgisby Jun 10, 2020
8e425a5
Add connectivity filter to k_configs
jackgisby Jul 16, 2020
4d3a57b
Incorporate connectivity filter into MSn build method
jackgisby Jul 16, 2020
8cf9419
Build substructures for each set of masses independently
jackgisby Jul 16, 2020
fdde9ba
Call itertools.product on substructures within multiprocessing portio…
jackgisby Jul 22, 2020
83c1885
Configure run script for current create_isomorphism_database inputs
jackgisby Jul 22, 2020
48d1d17
Built subsets should be empty list, not None
jackgisby Jul 22, 2020
7539896
Update variable names, remove debug options, update docstrings
jackgisby Jul 29, 2020
92ffc9d
Add annotate_msn and generate_structures user functions
jackgisby Jul 29, 2020
fa9e6d7
Move stage at which multiprocessing step is performed
jackgisby Jul 29, 2020
0ab5349
Allow for multiple output options in build
jackgisby Jul 29, 2020
59989e5
Remove ppm option for retrieving elemental composition from substruct…
jackgisby Jul 29, 2020
5d30eeb
Allow list of mc/exact_mass to be passed to generate_structures
jackgisby Jul 29, 2020
602d7ba
Use TemporaryDirectory to store unittest results
jackgisby Aug 1, 2020
9024b19
Let generate_structures return/yield smiles
jackgisby Aug 1, 2020
5123b77
Implement build_msn to incorporate considerations for building struct…
jackgisby Aug 1, 2020
f581da5
Implement annotate_msn to provide an interface to build_msn
jackgisby Aug 1, 2020
dbd5ea9
Add/update build docstrings
jackgisby Aug 1, 2020
fdcc286
Remove unnecessary build parameters
jackgisby Aug 1, 2020
e8ccd9d
Pass data dictionary to user-facing build functions rather than separ…
jackgisby Aug 1, 2020
e906068
Update variable naming conventions
jackgisby Aug 1, 2020
80efc58
Add newline between smiles in out file
jackgisby Aug 1, 2020
7c72240
Update SubstructureDb for removal of .pkl files
jackgisby Aug 1, 2020
df79c5c
Add function create_substructure_database
jackgisby Aug 1, 2020
01a4d21
Bring tests up to date with variable renaming
jackgisby Aug 1, 2020
2293f8e
Bring scripts up to date with variable renaming
jackgisby Aug 1, 2020
c6f5a6a
Simplify loading of test data and remove teardown
jackgisby Aug 1, 2020
5d1b95f
Remove unused class ConnectivityDb and update SubstructureDb parameters
jackgisby Aug 1, 2020
d7d5f1d
Implement additional non-msn build tests
jackgisby Aug 2, 2020
6028cc9
Improve temporary table cleaning logic
jackgisby Aug 2, 2020
30df67c
Fix issues with new build functions
jackgisby Aug 2, 2020
5635716
Allow tests to load auxiliary test data
jackgisby Aug 2, 2020
5ef79ea
Implement msn tests and update k_config test for new parameter
jackgisby Aug 2, 2020
4a9805b
Correctly specify ppm in generate_structures
jackgisby Aug 2, 2020
6a0f9a5
Minor docstring and code reformatting
jackgisby Aug 2, 2020
ec7de92
Add binder dir
jackgisby Aug 2, 2020
14a1e02
Add example notebook
jackgisby Aug 2, 2020
9938d31
Remove scripts
jackgisby Aug 2, 2020
356891b
Implement basic notebook
jackgisby Aug 2, 2020
1a4a081
Add small substructures to database prior to msn annotation
jackgisby Aug 2, 2020
9447756
Merge branch 'feat-NL_Combinations' into feat-notebooks
jackgisby Aug 2, 2020
a0be12e
Complete notebook example
jackgisby Aug 2, 2020
19fd6b1
Fix logic for when smi_out_dir is None
jackgisby Aug 2, 2020
602a4ce
Rename example_msms.ipynb to workflow.ipynb
jackgisby Aug 2, 2020
c7d5fed
Add pip to install metaboblend
RJMW Aug 3, 2020
6a648e4
Add data dir, remove databases dir, move test data to data dir
jackgisby Aug 4, 2020
11578e8
Write notebook databases to notebook_data
jackgisby Aug 4, 2020
a8ca306
Unzip test data
jackgisby Aug 4, 2020
f1079ba
Simplify test paths
jackgisby Aug 4, 2020
a91ffa2
Remove databases from gitignore
jackgisby Aug 4, 2020
3e2aacc
Use test databases for notebook
jackgisby Aug 4, 2020
66faac3
Implement simple hydrogenation rules
jackgisby Dec 18, 2020
c9642d6
Get bond types rather than number of available atoms for hydrogen rul…
jackgisby Dec 23, 2020
1986b2e
Don't count dummy atoms for bond type calculations
jackgisby Dec 23, 2020
ca904e9
Remove dummy atom mass
jackgisby Dec 23, 2020
8fbd8c5
Use max_degree of 6 and 2 available_atoms by default for create_subst…
jackgisby Dec 23, 2020
03a229f
Account for the fact we use neutral peaks (i.e. have removed adduct ion)
jackgisby Dec 23, 2020
1b256de
Modify hydrogen re-arrangement rules for doulbe bonds
jackgisby Dec 23, 2020
b995c8c
Update databases tests
jackgisby Dec 23, 2020
cfb0481
Implement test for calculate_possible_hydrogenations using reference …
jackgisby Dec 23, 2020
12db527
Add test for calculate_hydrogen_rearrangements
jackgisby Dec 23, 2020
d7f8979
Update hydrogen re-arrangement calculation function documentation
jackgisby Dec 23, 2020
5f2ce8c
Update remaining unit tests
jackgisby Dec 23, 2020
eb1a9b4
Add hydrogen re-arrangement compound HMDB XMLs
jackgisby Dec 23, 2020
7cee007
Record even substructures
jackgisby Jan 9, 2021
8f60ab5
Record even substructures in results DB
jackgisby Jan 9, 2021
d1e2127
Add indexes to improve combine_ecs function performance
jackgisby Feb 12, 2021
5c4576b
Improve results DB hierarchy and implement aggregation of scoring met…
jackgisby Feb 13, 2021
f119f7f
Define SQLite functions to calculate scores via queries alone
jackgisby Feb 13, 2021
7f913a0
Record max BDE in spectra results table
jackgisby Feb 15, 2021
40971a6
Calculate frequency in the absence of scores (for non-MSn method)
jackgisby Feb 15, 2021
1b160fa
Retain substructures does not cause substructures not to be initially…
jackgisby Feb 15, 2021
a615b65
Add additional scoring metrics
jackgisby Feb 15, 2021
5a0dede
Update results db test data
jackgisby Feb 15, 2021
d1ec1b2
Define ppm error and valence of fragment prior to re-ordering
jackgisby Feb 15, 2021
0b22e78
Configure checks on recording of putative structure information
jackgisby Feb 15, 2021
846c922
Calculate scores at substructure combination level
jackgisby Feb 15, 2021
47c5080
Convert True to 1 and False to 0 for conversion to SQLite boolean type
jackgisby Feb 15, 2021
2d6c5f2
Index results DB
jackgisby Feb 16, 2021
70e4e73
Use a loop in place of pool.map
jackgisby Feb 18, 2021
a78a3b7
Minor performance improvements
jackgisby Feb 18, 2021
55e4761
Merge minor performance improvements
jackgisby Feb 18, 2021
2934754
Use the minimum absolute error for getting possible fragment ions
jackgisby Feb 18, 2021
91b8b80
Add separate absolute error options for MSn peak and full structure
jackgisby Feb 18, 2021
e2ce55f
Use 0.005 for abs_error_precursor
jackgisby Feb 18, 2021
9aee6a6
Drop indexes before inserting into results DB
jackgisby Mar 3, 2021
1e59e75
Add results table index on ms_id_num and structure_smiles
jackgisby Mar 3, 2021
bd81baf
Update results DB tests
jackgisby Mar 3, 2021
8eefc7d
Add table for generating unique structure smiles IDs
jackgisby Mar 8, 2021
2076aaa
Calculate cosine spectrum similarity
jackgisby Mar 23, 2021
b024d17
Allow for the specification of weights for the results database scori…
jackgisby Mar 23, 2021
a06c45e
Aggregate structure scores but force floating point division
jackgisby Mar 25, 2021
15906f2
Select fragment and substructure id when calculating results scores f…
jackgisby Mar 25, 2021
017603a
Update results DB tests with updated scores
jackgisby Mar 25, 2021
f932cce
Don't create indexes until structure scoring
jackgisby Mar 30, 2021
504a573
Don't include valence=0 substructures in the substructure database
jackgisby Apr 7, 2021
760d039
Add max BDE parameter for building
jackgisby Apr 7, 2021
4c57c0e
Remove redundant connectivity graphs
jackgisby May 19, 2021
4a886a5
Update data to test filter records function
jackgisby May 19, 2021
135b6d7
Update dictionary pickle with Python 3.7
jackgisby May 19, 2021
85508f6
Add notebook
jackgisby Jul 13, 2022
2e9957f
Update file header
jackgisby Jul 13, 2022
fd8a8e1
Update contact information
jackgisby Jul 13, 2022
d62fa8d
Update setup.py
jackgisby Jul 13, 2022
54b60ad
Update tests for RDKit changes
jackgisby Jul 13, 2022
197c5fa
Update README
jackgisby Jul 13, 2022
010c0f1
Keep functioning buttons
jackgisby Jul 13, 2022
022f0c4
Update testing workflow
jackgisby Jul 13, 2022
c1d1bcd
Use python 3.7
jackgisby Jul 13, 2022
a727907
Remove unused dependencies
jackgisby Jul 13, 2022
a121159
Use only the channel conda-forge
jackgisby Jul 13, 2022
5145126
Add pillow and pyqt dependencies
jackgisby Jul 14, 2022
9718142
Remove list definition in function arguments
jackgisby Jul 14, 2022
3b3e4df
Add algorithms test
jackgisby Jul 14, 2022
c78a039
Merge database tests into single file
jackgisby Jul 14, 2022
6c1329a
Restructure modules
jackgisby Jul 14, 2022
64ee9dc
Restructure tests
jackgisby Jul 14, 2022
2bcbf7d
Update outdated imports
jackgisby Jul 14, 2022
a52ef36
Omit notebooks from coverage
jackgisby Jul 14, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .coveragerc
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
[run]
omit = tests/*,setup.py,metaboblend/__main__.py
omit = tests/*,setup.py,metaboblend/__main__.py,notebooks/
35 changes: 17 additions & 18 deletions .github/workflows/build-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@ jobs:

strategy:
matrix:
os: [ubuntu-latest, windows-latest, macos-latest]
python-version: [3.6, 3.7, 3.8]
os: [ ubuntu-latest, windows-latest, macos-latest ]
python-version: [ 3.7, 3.8, 3.9 ]

env:
OS: ${{ matrix.os }}
Expand All @@ -19,21 +19,29 @@ jobs:
- uses: actions/checkout@v2

- name: Setup conda - Python ${{ matrix.python-version }}
uses: s-weigand/setup-conda@v1
uses: conda-incubator/setup-miniconda@v2
with:
update-conda: true
auto-update-conda: true
activate-environment: metaboblend
python-version: ${{ matrix.python-version }}
conda-channels: anaconda, conda-forge
environment-file: environment.yml
channels: anaconda, conda-forge

- name: Install dependencies
- name: Build MetaboBlend
shell: bash -l {0}
run: |
python setup.py install
metaboblend --help

python --version
conda env update --file environment.yml --name base
- name: Test with pytest-cov
shell: bash -l {0}
run: |
conda install pytest codecov pytest-cov -c conda-forge
pytest --cov ./ --cov-config=.coveragerc --cov-report=xml

- name: Lint with flake8
shell: bash -l {0}
run: |

conda install flake8

# stop build if there are Python syntax errors or undefined names
Expand All @@ -42,15 +50,6 @@ jobs:
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics

- name: Test with pytest-cov
run: |

python setup.py install
metaboblend --help

conda install pytest codecov pytest-cov -c conda-forge
pytest --cov ./ --cov-config=.coveragerc --cov-report=xml

- name: Upload code coverage to codecov
uses: codecov/codecov-action@v1
with:
Expand Down
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,8 @@ target/

# Jupyter Notebook
.ipynb_checkpoints
notebooks/notebook_data
notebooks/notebook_data/*

# pyenv
.python-version
Expand Down Expand Up @@ -105,4 +107,4 @@ ENV/
# ignore test files
*/libgcc_s_dw2-1.dll
*/libstdc++-6.dll
tests/test*
tests/tmp*
10 changes: 6 additions & 4 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
MetaboBlend
===========
|Version| |Py versions| |Git| |Bioconda| |Build Status| |License| |RTD doc| |codecov| |binder|
..
|Version| |Py versions| |Bioconda| |RTD doc| |License| |binder|

|Git| |Build Status| |codecov|

Python package for *de novo* structural elucidation of small molecules in mass spectrometry-based Metabolomics

Expand Down Expand Up @@ -32,12 +35,11 @@ will help you to make the PR if you are new to `git`.
Developers & Contributors
-------------------------
- Ralf J. M. Weber ([email protected]) - `University of Birmingham (UK) <https://www.birmingham.ac.uk/staff/profiles/biosciences/weber-ralf.aspx>`_
- Jack Gisby ([email protected]) - `University of Birmingham (UK) <http://www.birmingham.ac.uk/index.aspx>`_

- Jack Gisby ([email protected]) - `University of Birmingham (UK) <http://www.birmingham.ac.uk/index.aspx>`_, `Imperial College London (UK) <https://www.imperial.ac.uk/>`_

Licenses
--------
MetaboBlend is licensed under the GNU General Public License v3.0 (see `LICENSE file <https://github.com/computational-metabolomics/metaboblend/blob/master/LICENSE>`_ for licensing information). Copyright © 2019 - 2020 Ralf Weber
MetaboBlend is licensed under the GNU General Public License v3.0 (see `LICENSE file <https://github.com/computational-metabolomics/metaboblend/blob/master/LICENSE>`_ for licensing information). Copyright © 2019 - 2020 Jack Gisby, Ralf Weber


.. |Build Status| image:: https://github.com/computational-metabolomics/metaboblend/workflows/metaboblend/badge.svg
Expand Down
17 changes: 17 additions & 0 deletions binder/environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
name: metaboblend
channels:
- conda-forge
- bioconda
dependencies:
- python=3.7
- numpy
- scipy
- pandas
- networkx
- rdkit
- biopython
- matplotlib
- nauty
- pip
- pip:
- -e ../
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
# -- Project information -----------------------------------------------------

project = 'MetaboBlend'
copyright = '2020, Ralf Weber'
copyright = '2020, Jack Gisby, Ralf Weber'
author = 'Jack Gisby, Ralf Weber'

# -- General configuration ---------------------------------------------------
Expand Down
2 changes: 1 addition & 1 deletion docs/source/license.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
License
-------
TODO: change package name
*MetaboBlend* is licensed under the GNU General Public License v3.0 (see `LICENSE file <https://github.com/computational-metabolomics/metaboblend/blob/master/LICENSE>`_ for licensing information). Copyright © 2019 - 2020 Ralf Weber
*MetaboBlend* is licensed under the GNU General Public License v3.0 (see `LICENSE file <https://github.com/computational-metabolomics/metaboblend/blob/master/LICENSE>`_ for licensing information). Copyright © 2019 - 2020 Jack Gisby, Ralf Weber
10 changes: 4 additions & 6 deletions environment.yml
Original file line number Diff line number Diff line change
@@ -1,14 +1,12 @@
name: metaboblend
channels:
- conda-forge
- bioconda
dependencies:
- python>=3.6
- python>=3.7
- pillow!=9.2.0
- pyqt
- matplotlib
- numpy
- scipy
- pandas
- networkx
- rdkit
- biopython
- matplotlib
- nauty
6 changes: 3 additions & 3 deletions metaboblend/__init__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# Copyright © 2019-2020 Ralf Weber
# Copyright © 2019-2020 Jack Gisby, Ralf Weber
#
# This file is part of MetaboBlend.
#
Expand All @@ -19,7 +19,7 @@
# along with MetaboBlend. If not, see <https://www.gnu.org/licenses/>.
#

__author__ = 'Ralf Weber ([email protected])'
__credits__ = 'Ralf Weber ([email protected])'
__authors__ = ['Ralf Weber ([email protected])', 'Jack Gisby ([email protected])']
__credits__ = ['Ralf Weber ([email protected])', 'Jack Gisby ([email protected])']
__version__ = '0.1.0'
__license__ = 'GPLv3'
2 changes: 1 addition & 1 deletion metaboblend/__main__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# Copyright © 2019-2020 Ralf Weber
# Copyright © 2019-2020 Jack Gisby, Ralf Weber
#
# This file is part of MetaboBlend.
#
Expand Down
34 changes: 32 additions & 2 deletions metaboblend/algorithms.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# Copyright © 2019-2020 Ralf Weber
# Copyright © 2019-2020 Jack Gisby, Ralf Weber
#
# This file is part of MetaboBlend.
#
Expand All @@ -20,9 +20,10 @@
#

import numpy
from math import sqrt


def find_path(mass_list, sum_matrix, n, mass, max_subset_length, path=[]):
def find_path(mass_list, sum_matrix, n, mass, max_subset_length, path=None):
"""
Recursive solution for backtracking through the dynamic programming boolean matrix. All possible subsets are found

Expand All @@ -42,6 +43,9 @@ def find_path(mass_list, sum_matrix, n, mass, max_subset_length, path=[]):
:return: Generates of lists containing the masses of valid subsets.
"""

if path is None:
path = []

# base case - the path has generated a correct solution
if mass == 0:
yield sorted(path)
Expand Down Expand Up @@ -103,3 +107,29 @@ def subset_sum(mass_list, mass, max_subset_length=3):

# backtrack through the matrix recursively to obtain all solutions
return find_path(mass_list, sum_matrix, n, mass, max_subset_length)


def cosine_spectrum_similarity(real_mzs, candidate_mzs):
"""
Database fragmentation scoring based on the cosine similarity method. Adapted for the lack of intensities
available for the candidate compound.

:param real_mzs: The mz values for the original MSn spectrum.

:param candidate_mzs: The theoretical mz values for a candidate compound. Should have the same order as `real_mzs`
and should have a value of `0` when there is no match for the candidate for a peak in the original spectrum.

:return: Similarity metric for the two spectra.
"""

# get weighted peaks
real_weighted = [(mz ** 2) for mz in real_mzs]
candidate_weighted = [(mz ** 2) for mz in candidate_mzs]

def dot(E, D):
return sum(e * d for e, d in zip(E, D))

def cosine_similarity(E, D):
return dot(E, D) / (sqrt(dot(E, E)) * sqrt(dot(D, D)))

return cosine_similarity(real_weighted, candidate_weighted)
Loading