Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rule-based fragmentation #17

Open
wants to merge 126 commits into
base: dev
Choose a base branch
from
Open

Conversation

jackgisby
Copy link
Collaborator

  • Add rule-based fragmentation
  • Restructure modules
  • Switch to conda-incubator for setting up Actions CI
  • Remove unused dependencies

jackgisby added 30 commits May 30, 2020 17:59
@codecov
Copy link

codecov bot commented Jul 14, 2022

Codecov Report

Merging #17 (a52ef36) into dev (88fd297) will decrease coverage by 0.24%.
The diff coverage is 94.01%.

@@            Coverage Diff             @@
##              dev      #17      +/-   ##
==========================================
- Coverage   94.86%   94.62%   -0.25%     
==========================================
  Files           7        8       +1     
  Lines         955     1190     +235     
==========================================
+ Hits          906     1126     +220     
- Misses         49       64      +15     
Flag Coverage Δ
unittests 94.62% <94.01%> (-0.25%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
metaboblend/databases/results.py 91.36% <91.36%> (ø)
metaboblend/build_structures/build.py 92.80% <92.80%> (ø)
metaboblend/build_structures/annotate.py 92.98% <92.98%> (ø)
metaboblend/databases/connectivity.py 96.03% <94.73%> (ø)
metaboblend/databases/substructures.py 95.53% <96.28%> (ø)
metaboblend/__init__.py 100.00% <100.00%> (ø)
metaboblend/algorithms.py 100.00% <100.00%> (ø)
metaboblend/parse.py 96.96% <100.00%> (+0.07%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 88fd297...a52ef36. Read the comment docs.

@jackgisby jackgisby requested a review from RJMW August 8, 2022 13:01
jackgisby added a commit to jackgisby/metaboblend that referenced this pull request Nov 7, 2022
* Compatibility with conda version of geng; remove geng tool from package

* Incorporate pkl files into connectivity database

* Add nauty as dependency

* Add pickle as test dependency

* Switch from strings to pickles for connectivity graphs

* Use blob instead of text to store pickled dictionary

* No longer write substructures to .smi

* Add option to build to select only frequent substructures

* Add connectivity filter to k_configs

* Incorporate connectivity filter into MSn build method

* Build substructures for each set of masses independently

* Call itertools.product on substructures within multiprocessing portion of build

* Configure run script for current create_isomorphism_database inputs

* Built subsets should be empty list, not None

* Update variable names, remove debug options, update docstrings

* Add annotate_msn and generate_structures user functions

* Move stage at which multiprocessing step is performed

* Allow for multiple output options in build

* Remove ppm option for retrieving elemental composition from substructure db

* Allow list of mc/exact_mass to be passed to generate_structures

* Use TemporaryDirectory to store unittest results

* Let generate_structures return/yield smiles

* Implement build_msn to incorporate considerations for building structures from MS/MS

* Implement annotate_msn to provide an interface to build_msn

* Add/update build docstrings

* Remove unnecessary build parameters

* Pass data dictionary to user-facing build functions rather than separate mc, exact_mass, MSn masses

* Update variable naming conventions

* Add newline between smiles in out file

* Update SubstructureDb for removal of .pkl files

* Add function create_substructure_database

* Bring tests up to date with variable renaming

* Bring scripts up to date with variable renaming

* Simplify loading of test data and remove teardown

* Remove unused class ConnectivityDb and update SubstructureDb parameters

* Implement additional non-msn build tests

* Improve temporary table cleaning logic

* Fix issues with new build functions

* Allow tests to load auxiliary test data

* Implement msn tests and update k_config test for new parameter

* Correctly specify ppm in generate_structures

* Minor docstring and code reformatting

* Add binder dir

* Add example notebook

* Remove scripts

* Implement basic notebook

* Add small substructures to database prior to msn annotation

* Complete notebook example

* Fix logic for when smi_out_dir is None

* Rename example_msms.ipynb to workflow.ipynb

* Add pip to install metaboblend

* Add data dir, remove databases dir, move test data to data dir

* Write notebook databases to notebook_data

* Unzip test data

* Simplify test paths

* Remove databases from gitignore

* Use test databases for notebook

* Implement simple hydrogenation rules

* Get bond types rather than number of available atoms for hydrogen rule calculations

* Don't count dummy atoms for bond type calculations

* Remove dummy atom mass

* Use max_degree of 6 and 2 available_atoms by default for create_substructure_database

* Account for the fact we use neutral peaks (i.e. have removed adduct ion)

* Modify hydrogen re-arrangement rules for doulbe bonds

* Update databases tests

* Implement test for calculate_possible_hydrogenations using reference numbers

* Add test for calculate_hydrogen_rearrangements

* Update hydrogen re-arrangement calculation function documentation

* Update remaining unit tests

* Add hydrogen re-arrangement compound HMDB XMLs

* Record even substructures

* Record even substructures in results DB

* Add indexes to improve combine_ecs function performance

* Improve results DB hierarchy and implement aggregation of scoring metrics

* Define SQLite functions to calculate scores via queries alone

* Record max BDE in spectra results table

* Calculate frequency in the absence of scores (for non-MSn method)

* Retain substructures does not cause substructures not to be initially recorded

* Add additional scoring metrics

* Update results db test data

* Define ppm error and valence of fragment prior to re-ordering

* Configure checks on recording of putative structure information

* Calculate scores at substructure combination level

* Convert True to 1 and False to 0 for conversion to SQLite boolean type

* Index results DB

* Use a loop in place of pool.map

* Minor performance improvements

* Merge minor performance improvements

* Use the minimum absolute error for getting possible fragment ions

* Add separate absolute error options for MSn peak and full structure

* Use 0.005 for abs_error_precursor

* Drop indexes before inserting into results DB

* Add results table index on ms_id_num and structure_smiles

* Update results DB tests

* Add table for generating unique structure smiles IDs

* Calculate cosine spectrum similarity

* Allow for the specification of weights for the results database scoring calculations

* Aggregate structure scores but force floating point division

* Select fragment and substructure id when calculating results scores for the correlated query

* Update results DB tests with updated scores

* Don't create indexes until structure scoring

* Don't include valence=0 substructures in the substructure database

* Add max BDE parameter for building

* Remove redundant connectivity graphs

* Update data to test filter records function

* Update dictionary pickle with Python 3.7

* Update file header

* Update contact information

* Update setup.py

* Update tests for RDKit changes

* Update README

* Keep functioning buttons

* Update testing workflow

* Use python 3.7

* Remove unused dependencies

* Use only the channel conda-forge

* Add pillow and pyqt dependencies

* Remove list definition in function arguments

* Add algorithms test

* Merge database tests into single file

* Restructure modules

* Restructure tests

* Update outdated imports

* Omit notebooks from coverage

Co-authored-by: Ralf Weber <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants