Generation of consensus metabolic reconstructions and community-dependent gap-filling using COMMIT for communites sampled from Arabidopsis thaliana.
This repository contains all files used for community-dependent gap-filling and subsequent analysis of consensus genome-scale metabolic reconstructions for two soil communities (Bulgarelli et al. 2012, Schlaeppi et al. 2014).
The consensus reconstructions were generated by merging draft reconstructions from four different approaches:
- KBase (Arkin et al. 2018)
- RAVEN 2.0 (Wang et al. 2018)
- CarveMe (Machado et al. 2018)
- AuReMe (Aite et al. 2018) / Pathway Tools (Karp et al. 2016)
- Matlab (tested with versions R2017b and R2020b)
- COBRA toolbox v3.0
- LP solver for the COBRA toolbox, tested with
- IBM CPLEX v12.9
- Gurobi v9.1.1
- not recommended: glpk solver
- HMMER (tested with v3.2.1)
- R packages: pheatmap, wesanderson, scales, plotrix, igraph, ape
Clone the GitHub repository using
git clone https://github.com/pwendering/COMMIT
Adjust settings and paths in the configuration script at
code/matlab/options.m
The configuration script will also add all COMMIT functions to the Matlab search path. For this run the following commands within Matlab:
run('[path_to_commit]/code/matlab/options.m')
Finally, add bash scripts at code/bash/
to your PATH variable.
The reaction database for gap-filling and the tables used for reaction and metabolite ID translation depend on the MetaNetX database.
If you want to update the gap-filling database run (might have to be adjusted for more current version of MNXref) the following steps:
-
download the data files of the MNXref database from here
reac_prop.tsv
reac_xref.tsv
chem_prop.tsv
chem_xref.tsv
-
Within Matlab run:
% update the reaction_MNXref_balanced.lst file
mnxref_dir = 'data/tables/MNXref';
createMnxRefReactionDbFile(mnxref_dir)
database_file = 'data/tables/MNXref/reaction_MNXref_balanced.lst';
compartments = {'c', 'e'};
% list of IDs that should not enter the gap-filling database
black_list = {...
'BIOMASS',... % Biomass
'MNXM517',... % Light/hnu/photon
'MNXM639',... % peptide
};
dbModel_MNXref_balanced = prepareFastGapFilling(database_file, compartments, black_list)
dbModel_MNXref_balanced = addNamesPermeabilityToGfDb(dbModel_MNXref_balanced, mnxref_dir);
save('data/gap-filling/database/Universal-model-MNXref-balanced.mat',...
'dbModel_MNXref_balanced')
If you want to update the tables for ID translation run within Matlab (this step takes some time):
mnxref_dir = 'data/tables/MNXref';
createMnxrefTranslationTables(mnxref_dir)
-
create draft reconstructions with different approaches for different habitats
-
save the draft reconstructions as Matlab workspaces ('.mat') under data/models/habitat/method for each habitat and reconstruction approach
- for this, read individual models using the COBRA toolbox function
readCbModel
, collect them in a cell array and save it as a Matlab workspace - The following code may serve as a blueprint
% specify model file names file_names = {'model_1.xml', 'model_2.xml', 'model_3.xml'}; % initialize cell array models = cell(1, numel(file_names)); % read sbml files and add them to the cell array for i = 1:numel(models) model = readCbModel(file_names{i}); models{i} = model; end % save cell array as workspace save('models', 'models')
- for this, read individual models using the COBRA toolbox function
To generate consensus models from multiple reconstruction approaches use the script
code/model_generation/merge_metabolic_models.m
Adjust the variables to specify the location of reconstructions from single approaches (see point Preparation, above):
habitats
: habitats where samples were obtained
methods
: reconstruction approaches used
The consensus models will be saved under data/models/consensus
.
Some reconstruction approaches use their own annotation pipeline, starting from the genome sequence (e.g. KBase), while others require a multi-fasta file (i.e. structural annotation). As the structural annotation can differ and different pipelines assign different gene IDs, the resulting reconstructions from different approaches can contain different gene IDs. For this publication, I mapped gene IDs using BLAST to be able to compare the gene sets that are included in the reconstructions obtained from the different approaches.
- Create a BLAST database for reference gene identifiers
- use your own structural annotation or a structural annotation obtained from one of the reconstruction approaches as a reference
- use the script
code/bash/batch-makeblastdb.sh
to generate BLAST databases for each amino acid multi-fasta file - use
code/bash/batch-makeblastdb-nucl.sh
if you have nucleic acid sequences
- If not already available, create multi-fasta files from .gff files
- e.g. in KBase, you can export .gff files and use them together with your genome sequence to create a multi-fasta file
- use the script
code/bash/convert-gff-to-fasta.sh
- this script required bedtools to be installed
- Run blastp or blastx to find corresponding genes or proteins
- The scripts
code/bash/batch-blastp.sh
andcode/bash/batch-blastx.sh
loop after all multi-fasta files in a directory and perform BLAST searches for all proteins/genes in a fasta file - only one hit per protein is returned to obtain a one-to-one mapping
- this is not ideal because sequences might be split or lumped between structural annotations
- The scripts
- The Matlab function
code/matlab/translateGeneIDs.m
can then be used to map gene IDs in the reconstructions, such that GPR rules can be combined and compared
- the script
COMMIT/code/matlab/commit.m
contains the workflow for the publication - the script
COMMIT/code/matlab/gap_filling/run_iterative_gap_filling.m
performs the conditional gap filling procedure of COMMIT - the script
COMMIT/code/matlab/gap_filling/gap_fill_individual_models.m
performs gap-filling on the individual reconstructions without considering the community composition
Wendering P, Nikoloski Z (2022) COMMIT: Consideration of metabolite leakage and community composition improves microbial community reconstructions. PLOS Computational Biology 18(3): e1009906. https://doi.org/10.1371/journal.pcbi.1009906