Skip to content

COMMIT: Community-dependent gap-filling considering metabolite leakage

License

Notifications You must be signed in to change notification settings

pwendering/COMMIT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Generation of consensus metabolic reconstructions and community-dependent gap-filling using COMMIT for communites sampled from Arabidopsis thaliana.

DOI

Description

This repository contains all files used for community-dependent gap-filling and subsequent analysis of consensus genome-scale metabolic reconstructions for two soil communities (Bulgarelli et al. 2012, Schlaeppi et al. 2014).

The consensus reconstructions were generated by merging draft reconstructions from four different approaches:

Requirements for COMMIT

  • Matlab (tested with versions R2017b and R2020b)
  • COBRA toolbox v3.0
  • LP solver for the COBRA toolbox, tested with
  • HMMER (tested with v3.2.1)
  • R packages: pheatmap, wesanderson, scales, plotrix, igraph, ape

Installation

Clone the GitHub repository using

git clone https://github.com/pwendering/COMMIT

Adjust settings and paths in the configuration script at

code/matlab/options.m

The configuration script will also add all COMMIT functions to the Matlab search path. For this run the following commands within Matlab:

run('[path_to_commit]/code/matlab/options.m')

Finally, add bash scripts at code/bash/ to your PATH variable.

(optional) update reaction database

The reaction database for gap-filling and the tables used for reaction and metabolite ID translation depend on the MetaNetX database.

If you want to update the gap-filling database run (might have to be adjusted for more current version of MNXref) the following steps:

  • download the data files of the MNXref database from here

    • reac_prop.tsv
    • reac_xref.tsv
    • chem_prop.tsv
    • chem_xref.tsv
  • Within Matlab run:

% update the reaction_MNXref_balanced.lst file
mnxref_dir = 'data/tables/MNXref';
createMnxRefReactionDbFile(mnxref_dir)

database_file = 'data/tables/MNXref/reaction_MNXref_balanced.lst';
compartments = {'c', 'e'};
% list of IDs that should not enter the gap-filling database
black_list = {...
	'BIOMASS',...  % Biomass
    'MNXM517',...  % Light/hnu/photon
    'MNXM639',...  % peptide
    };
dbModel_MNXref_balanced = prepareFastGapFilling(database_file, compartments, black_list)
dbModel_MNXref_balanced = addNamesPermeabilityToGfDb(dbModel_MNXref_balanced, mnxref_dir);
save('data/gap-filling/database/Universal-model-MNXref-balanced.mat',...
    'dbModel_MNXref_balanced')

If you want to update the tables for ID translation run within Matlab (this step takes some time):

mnxref_dir = 'data/tables/MNXref';
createMnxrefTranslationTables(mnxref_dir)

Generation of consensus models

Preparation

  • create draft reconstructions with different approaches for different habitats

  • save the draft reconstructions as Matlab workspaces ('.mat') under data/models/habitat/method for each habitat and reconstruction approach

    • for this, read individual models using the COBRA toolbox function readCbModel, collect them in a cell array and save it as a Matlab workspace
    • The following code may serve as a blueprint
     % specify model file names
     file_names = {'model_1.xml', 'model_2.xml', 'model_3.xml'};
     
     % initialize cell array
     models = cell(1, numel(file_names));
     
     % read sbml files and add them to the cell array
     for i = 1:numel(models)
     	model = readCbModel(file_names{i});
     	models{i} = model;
     end
     
     % save cell array as workspace
     save('models', 'models')
     
    

Merge metabolic reconstructions

To generate consensus models from multiple reconstruction approaches use the script

code/model_generation/merge_metabolic_models.m

Adjust the variables to specify the location of reconstructions from single approaches (see point Preparation, above):

habitats: habitats where samples were obtained

methods: reconstruction approaches used

The consensus models will be saved under data/models/consensus.

(optional) Reconcile gene IDs

Some reconstruction approaches use their own annotation pipeline, starting from the genome sequence (e.g. KBase), while others require a multi-fasta file (i.e. structural annotation). As the structural annotation can differ and different pipelines assign different gene IDs, the resulting reconstructions from different approaches can contain different gene IDs. For this publication, I mapped gene IDs using BLAST to be able to compare the gene sets that are included in the reconstructions obtained from the different approaches.

  1. Create a BLAST database for reference gene identifiers
    • use your own structural annotation or a structural annotation obtained from one of the reconstruction approaches as a reference
    • use the script code/bash/batch-makeblastdb.sh to generate BLAST databases for each amino acid multi-fasta file
    • use code/bash/batch-makeblastdb-nucl.sh if you have nucleic acid sequences
  2. If not already available, create multi-fasta files from .gff files
    • e.g. in KBase, you can export .gff files and use them together with your genome sequence to create a multi-fasta file
    • use the script code/bash/convert-gff-to-fasta.sh
    • this script required bedtools to be installed
  3. Run blastp or blastx to find corresponding genes or proteins
    • The scripts code/bash/batch-blastp.sh and code/bash/batch-blastx.sh loop after all multi-fasta files in a directory and perform BLAST searches for all proteins/genes in a fasta file
    • only one hit per protein is returned to obtain a one-to-one mapping
    • this is not ideal because sequences might be split or lumped between structural annotations
  4. The Matlab function code/matlab/translateGeneIDs.m can then be used to map gene IDs in the reconstructions, such that GPR rules can be combined and compared

Workflow for the publication results

  • the script COMMIT/code/matlab/commit.m contains the workflow for the publication
  • the script COMMIT/code/matlab/gap_filling/run_iterative_gap_filling.m performs the conditional gap filling procedure of COMMIT
  • the script COMMIT/code/matlab/gap_filling/gap_fill_individual_models.m performs gap-filling on the individual reconstructions without considering the community composition

Reference

Wendering P, Nikoloski Z (2022) COMMIT: Consideration of metabolite leakage and community composition improves microbial community reconstructions. PLOS Computational Biology 18(3): e1009906. https://doi.org/10.1371/journal.pcbi.1009906