Improvement of gap-filling in refineGEMs #52

GwennyGit · 2023-01-19T09:40:52Z

In this issue all current gap-filling tools implemented in refineGEMs are summarised and possible enhancements explored.

Current gap-filling modules:

genecomp (now: kegg_analysis):
⇾ Extracts KEGG gene identifiers from model
⇾ Compares KEGG genes in model with the strain-specific ones in KEGG
⇾ Extracts RefSeq IDs (GPR) from the .gff file
⇾ Maps BiGG to KEGG IDs
⇒ Returns a table containing missing reactions with locus tag, EC number, KEGG ID, BiGG ID and RefSeq ID (GPR)
curate:
1. With option gapfill
  ⇾ Adds reactions with the corresponding IDs, stoichiometric coefficients, educts, products, upper & lower bound to the model from a manually obtained table
2. With option metabs
  ⇾ Adds metabolites with the corresponding IDs, formulae, and name to the model from a manually obtained table
  ⇾ Synchronises the metabolite information over all compartments

Creation of gapfill module for BioCyc (& Adjustment of genecomp to gapfill):

Further improvements:

Retrieve missing metabolites via 'bigg_reaction' column from reactions table instead of from the 'Reactants' and 'Products' columns
Add functionality to apply BioCyc comparison to models where organism does not occurr in any database:
- ~~From the GFF & FASTA files with DIAMOND & BioCyc SmartTables (→ lab_strain) obtain missing genes~~ → Already fulfilled now by GeneGapFiller
Adjust kegg_analysis to also return tables with missing genes & metabolites
→ Then the result from kegg_analysis can also be added to a model. → Already fulfilled now by KEGGapFiller
Do a complete merge on all BioCyc & KEGG tables on user request
Make update_annottions_from_table part of GapFiller.fill_model
Add a check similar to verifyGapfilledReactions to GapFiller.fill_model

Further ideas:

a type of template gap filling that utilises a genome and a template model to extract information of homologous genes and reactions to improve the current model
A ManualGapFiller that is initialised with tables containing information on missing genes and reactions
- Tables should be similar to already used ones in all GapFiller classes.
  $\rightarrow$ Maybe add getter & setter for missing_reactions & missing_genes? 🤔

The text was updated successfully, but these errors were encountered:

famosab · 2023-01-31T15:21:59Z

Regarding Function I: we already have a parser for gff files integrated (it is in the function get_locus_gpr from genecomp). Maybe we can expand from there - the only obstacle would be to make sure we have similar IDs that can be compared to each other. At the moment I have the problem that the GPRs in my models cannot be found in my gff file because the naming is completely different.

GwennyGit · 2023-02-01T14:49:09Z

Regarding function I: For strains that are not in KEGG but in BioCyc I think it will be better to use the BioCyc SmartTables as reference. However, for lab strains this function could be still useful. 🤔 For the BioCyc option I will add a comment to Function I. Maybe for that the script from Reihaneh (@Biomathsys) could be used (or maybe adjusted), see Code here: https://github.com/draeger-lab/py4gems/blob/main/Reihaneh/1.%20BioCyc_Comparison.ipynb.

GwennyGit · 2023-02-02T09:45:52Z

The current module curate does only add genes and reactions to the model. However, for gapfill the metabolites should be added too. In Reihaneh's script COBRA is used and the implementation to add reactions and metabolites is already properly implemented. Thus, I will use her approach for that. In addition, I will use the function to add the missing genes from curate and extend the tables from KEGG/BioCyc/GFF file with the BiGG identifiers for each reaction and metabolite, respectively.

As the function to add reactions will not be used from curate this module will be kept as such. However, the following three new modules will be generated in addition to the gapfill module: entities, analysis_kegg and analysis_biocyc.

Removed a function that was generalised to work with KEGG and BioCyc and is now in gapfill.

GwennyGit · 2023-02-02T16:38:01Z

For the comparison between already existing metabolites & reactions I realised that if I add the BiGG identifiers to the table the checks from Reihaneh's script are not necessary. Thus, I will extend the functionality of entities.

In analysis_kegg the function get_locus_gpr (line 167) can be adjusted to get the protein IDs from the Genbank GFF/FASTA (CarveMe input) file. The currently retrieved GPRs are basically the RefSeq identifiers from the RefSeq GFF file. Additionally, the function should be moved to gapfill as it can be used for analysis_kegg and analysis_biocyc.
Note to myself: Are there maybe more functions that can be used for both modules? Potentially the function retrieving the BiGG IDs?

GwennyGit · 2023-02-03T08:01:39Z

Extracting the functions required for analysis_kegg and analysis_biocyc from analysis_kegg into gapfill created a cycle as these functions were called in each of the other two modules and these modules in return were called within the gapfill module. To still reduce redundancy another module analysis_db is created that now contains the overlapping functions for analysis_kegg and analysis_biocyc.

famosab · 2023-02-03T13:18:10Z

Maybe the following publication / code is of interest NICEgame. They mention in their manuscript that they also worked with Python, however in the gh repo I only found Matlab scripts.

GwennyGit · 2023-02-03T14:56:12Z

From the paper I understand that the authors use media for which it is known that the bacterium should grow on to fill gaps in the model. This approach would be similar to the one from the CarveMe documentation or also the gap filling approach from COBRApy. This would be a nice addition to the gap filling via the genes I think. I already considered adding the call for the CarveMe gap filling after using the gap filling from the genes. However, as far as I understood these programs the user needs to know exactly on which media the bacterium would grow. Thus, I find it rather difficult to use any of the tools as we have strain-specific models. For which I suppose that not every strain of e.g. Staphylococcus haemolyticus grows on the same media, especially, if microbiome media are used like SNM3. 🤔

famosab · 2023-02-07T11:10:50Z

We can use requests to access the BiGG database.

Here is an example how to use it with BiGG:

import requests
import refinegems as rg

reac_url = 'http://bigg.ucsd.edu/api/v2/universal/reactions/'
metab_url = 'http://bigg.ucsd.edu/api/v2/universal/metabolites/'

mod = rg.load.load_model_cobra('../../models/Cstr_14.xml')

# requests.get(metab_url+'o2').json()['charges']

for metab in mod.metabolites:
    id = metab.id[:-2]
    print(id, requests.get(metab_url+id).json()['name'])

For metabolites these field can be accessed ['database_links', 'bigg_id', 'formulae', 'old_identifiers', 'charges', 'name', 'compartments_in_models']. Metabolites need to be entered without the compartment information and the beginning M so instead of M_o2_c use o2.

For reactions ['models_containing_reaction', 'reaction_string', 'metabolites', 'database_links', 'bigg_id', 'old_identifiers', 'name', 'pseudoreaction'].

GwennyGit · 2023-02-08T20:56:28Z

To have all parsing functions combined the module parse was created. However, not all functions that would potentially fit into this module have been added yet.

The function add_charges_chemical_formulae_to_metabs in the module analysis_biocyc currently causes a KeyError which should be solved in the next commit.

first version of KEGG,MNX,BiGG reconstruction of metabolites and reactions

Added functions for adding reac / metabs per database id

some clean up, some new docstrings, started func for fill_model

changed missing genes and reacs to attributes instead of return values

- Adjusted due to new db_access set up - Started writing code for mapping from BioCyc IDs to other databases for the BioCycGapFiller

Should labels be added automatically if None are in model for the GapFillers to work?

GwennyGit added the enhancement New feature or request label Jan 19, 2023

famosab added this to the New functions towards a version 1.1 milestone Jan 19, 2023

famosab mentioned this issue Feb 1, 2023

Add functionality to add more identifiers to GeneProducts in polish #53

Closed

5 tasks

GwennyGit added a commit that referenced this issue Feb 2, 2023

Created analysis_biocyc.py #52

7a75107

GwennyGit added a commit that referenced this issue Feb 2, 2023

Created entities.py #52

9c4d9f5

GwennyGit added a commit that referenced this issue Feb 2, 2023

Created gapfill.py #52

875e4eb

GwennyGit added a commit that referenced this issue Feb 2, 2023

Renamed genecomp to analysis_kegg.py #52

96eda8f

GwennyGit added a commit that referenced this issue Feb 2, 2023

Updated __init__.py due to module changes #52

0326be3

GwennyGit added a commit that referenced this issue Feb 2, 2023

Adjusted code in curate #52

0c58e02

GwennyGit added a commit that referenced this issue Feb 2, 2023

Adjusted code in analysis_kegg #52

810f9d9

Removed a function that was generalised to work with KEGG and BioCyc and is now in gapfill.

GwennyGit added a commit that referenced this issue Feb 8, 2023

Added code for function gapfill_analysis #52

1f95746

GwennyGit added a commit that referenced this issue Feb 8, 2023

Updated analysis_biocyc #52

890f2fa

GwennyGit added a commit that referenced this issue Feb 8, 2023

Updated analysis_kegg due to refactoring #52

e94fe4d

GwennyGit added a commit that referenced this issue Feb 8, 2023

Updated entities due to refactoring #52

e947d42

GwennyGit added a commit that referenced this issue Feb 8, 2023

Added InChI-Key to metabol_db_dict in cvterms #52 #59

8eeb1ed

GwennyGit added a commit that referenced this issue Feb 8, 2023

Created analysis_db due to refactoring #52

624131b

GwennyGit added a commit that referenced this issue Feb 8, 2023

Created parse due to refactoring #52

78c4fba

GwennyGit added a commit that referenced this issue Feb 8, 2023

Updated __init__ due to new modules #52

8f65d0d

GwennyGit added a commit that referenced this issue Feb 8, 2023

Updated main due to new function gapfill_analysis #52

fac0df6

GwennyGit added a commit that referenced this issue Feb 8, 2023

Added bigg_models_metabolites.txt for gapfill #52

f7238e7

GwennyGit added a commit that referenced this issue Feb 8, 2023

Updated config due to new function gapfill_analysis #52

0b5c6f4

cb-Hades added a commit that referenced this issue Jul 31, 2024

added a url download function #126 #52

848794b

cb-Hades added a commit that referenced this issue Jul 31, 2024

Update #126 #52

78f7408

cb-Hades added a commit that referenced this issue Aug 1, 2024

Update gapfill #126 #52

61843e2

cb-Hades added a commit that referenced this issue Aug 7, 2024

Update gapfill #126 #52

1ea778f

cb-Hades added a commit that referenced this issue Aug 7, 2024

Added KEGG reaction builder #126 , #52

b787a5e

cb-Hades added a commit that referenced this issue Aug 8, 2024

Update gapfill-testing #126 #52

ad9f3e6

first version of KEGG,MNX,BiGG reconstruction of metabolites and reactions

cb-Hades added a commit that referenced this issue Aug 9, 2024

Update gapfill #126 #52

1bbfdd3

Added functions for adding reac / metabs per database id

cb-Hades added a commit that referenced this issue Aug 12, 2024

update gapfill mainly #126 #52

ef46885

some clean up, some new docstrings, started func for fill_model

GwennyGit added a commit that referenced this issue Aug 14, 2024

Updated gapfill.py #52 #126

7324828

GwennyGit added a commit that referenced this issue Aug 15, 2024

Trial run of new BioCycGapFill set-up #52 #126

d6b9d12

GwennyGit added a commit that referenced this issue Aug 15, 2024

Updated BioCycGapFiller #52 #126

7fe29ff

GwennyGit added a commit that referenced this issue Aug 15, 2024

Started with build_reaction_biocyc #52 #126

eb20f71

GwennyGit added a commit that referenced this issue Aug 16, 2024

Started finalising BioCycGapFiller #52 #126

1c1420c

GwennyGit added a commit that referenced this issue Aug 16, 2024

Tested BioCycGapFiller filling part #52 #126

671a577

GwennyGit added a commit that referenced this issue Aug 16, 2024

Updated parse_reac_str for BioCyc #52 #126

6c01b5f

GwennyGit added a commit that referenced this issue Aug 16, 2024

Trial of updated parse_reac_str for BioCyc #52 #126

6284292

cb-Hades added a commit that referenced this issue Aug 19, 2024

Update gapfill #126 #52

a50e03a

changed missing genes and reacs to attributes instead of return values

GwennyGit added a commit that referenced this issue Aug 20, 2024

Update of the gapfill set up #52 #126

dfc11b3

GwennyGit added a commit that referenced this issue Aug 21, 2024

Adjusted gapfill.py & entities.py #52 #126

a476f06

- Adjusted due to new db_access set up - Started writing code for mapping from BioCyc IDs to other databases for the BioCycGapFiller

GwennyGit added a commit that referenced this issue Aug 21, 2024

Tested BioCycGapFiller #52 #126

cd4e354

cb-Hades added a commit that referenced this issue Aug 22, 2024

Fixed issue with gene duplicates #52 #121

182719b

GwennyGit added a commit that referenced this issue Aug 22, 2024

Update BioCycGapFiller #52 #126

74c65ab

GwennyGit added a commit that referenced this issue Aug 22, 2024

Updated BioCycGapFiller #52 #126

75988af

GwennyGit added a commit that referenced this issue Aug 23, 2024

Completed BioCycGapFiller analysis part #52 #126

35bab49

GwennyGit added a commit that referenced this issue Oct 28, 2024

Added TODOs for gap-filling #52 #126

6df8991

GwennyGit added a commit that referenced this issue Oct 28, 2024

Update gapfill.rst #52 #126

afcc770

niinina added a commit that referenced this issue Nov 11, 2024

small changes in gapfill #52

15d0de7

GwennyGit added a commit that referenced this issue Nov 15, 2024

Opened discussion in gapfill.py #52 #126

084666d

Should labels be added automatically if None are in model for the GapFillers to work?

GwennyGit added a commit that referenced this issue Nov 20, 2024

Update of the gapfill.rst #52 #121 #126

5470523

niinina added a commit that referenced this issue Nov 20, 2024

Update of gapfill-docs #52 #126

56a05f6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvement of gap-filling in refineGEMs #52

Improvement of gap-filling in refineGEMs #52

GwennyGit commented Jan 19, 2023 •

edited

Loading

famosab commented Jan 31, 2023

GwennyGit commented Feb 1, 2023 •

edited

Loading

GwennyGit commented Feb 2, 2023 •

edited

Loading

GwennyGit commented Feb 2, 2023 •

edited

Loading

GwennyGit commented Feb 3, 2023

famosab commented Feb 3, 2023 •

edited

Loading

GwennyGit commented Feb 3, 2023

famosab commented Feb 7, 2023 •

edited

Loading

GwennyGit commented Feb 8, 2023

Improvement of gap-filling in refineGEMs #52

Improvement of gap-filling in refineGEMs #52

Comments

GwennyGit commented Jan 19, 2023 • edited Loading

In this issue all current gap-filling tools implemented in refineGEMs are summarised and possible enhancements explored.

famosab commented Jan 31, 2023

GwennyGit commented Feb 1, 2023 • edited Loading

GwennyGit commented Feb 2, 2023 • edited Loading

GwennyGit commented Feb 2, 2023 • edited Loading

GwennyGit commented Feb 3, 2023

famosab commented Feb 3, 2023 • edited Loading

GwennyGit commented Feb 3, 2023

famosab commented Feb 7, 2023 • edited Loading

GwennyGit commented Feb 8, 2023

GwennyGit commented Jan 19, 2023 •

edited

Loading

GwennyGit commented Feb 1, 2023 •

edited

Loading

GwennyGit commented Feb 2, 2023 •

edited

Loading

GwennyGit commented Feb 2, 2023 •

edited

Loading

famosab commented Feb 3, 2023 •

edited

Loading

famosab commented Feb 7, 2023 •

edited

Loading