-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvement of gap-filling in refineGEMs #52
Comments
Regarding Function I: we already have a parser for gff files integrated (it is in the function get_locus_gpr from genecomp). Maybe we can expand from there - the only obstacle would be to make sure we have similar IDs that can be compared to each other. At the moment I have the problem that the GPRs in my models cannot be found in my gff file because the naming is completely different. |
Regarding function I: For strains that are not in KEGG but in BioCyc I think it will be better to use the BioCyc SmartTables as reference. However, for lab strains this function could be still useful. 🤔 For the BioCyc option I will add a comment to Function I. Maybe for that the script from Reihaneh (@Biomathsys) could be used (or maybe adjusted), see Code here: https://github.com/draeger-lab/py4gems/blob/main/Reihaneh/1.%20BioCyc_Comparison.ipynb. |
The current module As the function to add reactions will not be used from |
Removed a function that was generalised to work with KEGG and BioCyc and is now in gapfill.
For the comparison between already existing metabolites & reactions I realised that if I add the BiGG identifiers to the table the checks from Reihaneh's script are not necessary. Thus, I will extend the functionality of In |
Extracting the functions required for |
Maybe the following publication / code is of interest NICEgame. They mention in their manuscript that they also worked with Python, however in the gh repo I only found Matlab scripts. |
From the paper I understand that the authors use media for which it is known that the bacterium should grow on to fill gaps in the model. This approach would be similar to the one from the CarveMe documentation or also the gap filling approach from COBRApy. This would be a nice addition to the gap filling via the genes I think. I already considered adding the call for the CarveMe gap filling after using the gap filling from the genes. However, as far as I understood these programs the user needs to know exactly on which media the bacterium would grow. Thus, I find it rather difficult to use any of the tools as we have strain-specific models. For which I suppose that not every strain of e.g. Staphylococcus haemolyticus grows on the same media, especially, if microbiome media are used like SNM3. 🤔 |
We can use requests to access the BiGG database. Here is an example how to use it with BiGG: import requests
import refinegems as rg
reac_url = 'http://bigg.ucsd.edu/api/v2/universal/reactions/'
metab_url = 'http://bigg.ucsd.edu/api/v2/universal/metabolites/'
mod = rg.load.load_model_cobra('../../models/Cstr_14.xml')
# requests.get(metab_url+'o2').json()['charges']
for metab in mod.metabolites:
id = metab.id[:-2]
print(id, requests.get(metab_url+id).json()['name']) For metabolites these field can be accessed For reactions |
To have all parsing functions combined the module The function |
first version of KEGG,MNX,BiGG reconstruction of metabolites and reactions
Added functions for adding reac / metabs per database id
some clean up, some new docstrings, started func for fill_model
changed missing genes and reacs to attributes instead of return values
- Adjusted due to new db_access set up - Started writing code for mapping from BioCyc IDs to other databases for the BioCycGapFiller
Should labels be added automatically if None are in model for the GapFillers to work?
In this issue all current gap-filling tools implemented in refineGEMs are summarised and possible enhancements explored.
Current gap-filling modules:
genecomp
(now:kegg_analysis
):⇾ Extracts KEGG gene identifiers from model
⇾ Compares KEGG genes in model with the strain-specific ones in KEGG
⇾ Extracts RefSeq IDs (GPR) from the .gff file
⇾ Maps BiGG to KEGG IDs
⇒ Returns a table containing missing reactions with locus tag, EC number, KEGG ID, BiGG ID and RefSeq ID (GPR)
curate
:gapfill
⇾ Adds reactions with the corresponding IDs, stoichiometric coefficients, educts, products, upper & lower bound to the model from a manually obtained table
metabs
⇾ Adds metabolites with the corresponding IDs, formulae, and name to the model from a manually obtained table
⇾ Synchronises the metabolite information over all compartments
Creation of
gapfill
module for BioCyc (& Adjustment ofgenecomp
togapfill
):gapfill
,entities
&analysis_biocyc
genecomp
module toanalysis_kegg
analysis_kegg
curate
analysis_kegg
&analysis_biocyc
intoanalysis_db
gapfill
:gapfill_analysis
)analysis_biocyc
)analysis_kegg
, only obtains reactions & genes/proteins)entities
, See script: https://github.com/draeger-lab/py4gems/blob/main/Reihaneh/1.%20BioCyc_Comparison.ipynb)Further improvements:
From the GFF & FASTA files with DIAMOND & BioCyc SmartTables (→→ Already fulfilled now bylab_strain
) obtain missing genesGeneGapFiller
Adjust→ Already fulfilled now bykegg_analysis
to also return tables with missing genes & metabolites→ Then the result from
kegg_analysis
can also be added to a model.KEGGapFiller
update_annottions_from_table
part ofGapFiller.fill_model
verifyGapfilledReactions
toGapFiller.fill_model
Further ideas:
missing_reactions
&missing_genes
? 🤔The text was updated successfully, but these errors were encountered: