diff --git a/GECKOInstaller.m b/GECKOInstaller.m index 434b96952..2811abc84 100644 --- a/GECKOInstaller.m +++ b/GECKOInstaller.m @@ -28,6 +28,7 @@ end addpath(paths); savepath; + GECKOInstaller.checkGECKOversion; end function uninstall sourceDir = fileparts(which(mfilename)); @@ -59,6 +60,7 @@ pathsLeft = splitPaths(1,okPaths); newPaths = char(join(pathsLeft, pathSep)); end + function checkRAVENversion(minmVer) try currVer = checkInstallation('versionOnly'); @@ -81,6 +83,39 @@ function checkRAVENversion(minmVer) end end + + function checkGECKOversion + sourceDir = fileparts(which(mfilename)); + hasGit=isfolder(fullfile(sourceDir,'.git')); + + if exist(fullfile(sourceDir,'version.txt'), 'file') == 2 + currVer = fgetl(fopen(fullfile(sourceDir,'version.txt'))); + fclose('all'); + fprintf('GECKO version %s installed',currVer) + try + newVer=strtrim(webread('https://raw.githubusercontent.com/SysBioChalmers/GECKO/main/version.txt')); + newVerNum=str2double(strsplit(newVer,'.')); + currVerNum=str2double(strsplit(currVer,'.')); + for i=1:3 + if currVerNum(i)here\n'); + end + break + elseif i==3 + fprintf('\n'); + end + end + catch + fprintf('\n'); + end + else + fprintf('GECKO installed, unknown version (cannot find version.txt)\n') + end + end end properties (Constant) diff --git a/README.md b/README.md index c0fee9c1f..18cf48537 100644 --- a/README.md +++ b/README.md @@ -1,72 +1,26 @@ ![Current version](https://badge.fury.io/gh/sysbiochalmers%2Fgecko.svg) -[![Gitter](https://badges.gitter.im/SysBioChalmers/GECKO.svg)](https://gitter.im/SysBioChalmers/GECKO) +[![GitHub Discussions](https://img.shields.io/github/discussions-search?query=repo%3Asysbiochalmers%2Fgecko&label=GitHub%20Discussions)](https://github.com/SysBioChalmers/GECKO/discussions) [![Zenodo](https://zenodo.org/badge/DOI/10.5281/zenodo.7699818.svg)](https://doi.org/10.5281/zenodo.7699818) ## About GECKO 3 The **GECKO** toolbox enhances a **G**enome-scale model to account for **E**nzyme **C**onstraints, using **K**inetics and **O**mics. The resulting enzyme-constrained model (**ecModel**) can be used to perform simulations where enzyme allocation is either drawn from a total protein pool, or constrained by measured protein levels from proteomics data. -💡 In the GECKO folder, `protocol.m` contains instructions on how to reconstruct and analyze an ecModel for _S. cerevisiae_. This demonstrates how many of GECKO's functions can be used. -💡 In the [`GECKO/tutorials`](https://github.com/SysBioChalmers/GECKO/tree/main/tutorials) folder there are examples of how GECKO can be applied to GEMs, in either of its _full_ or _light_ forms. Each `protocol.m` contains instructions on how to reconstruct and analyze an ecModel, demonstrating how different fuctions in GECKO can be used. The source code documentation is also available -[online](http://sysbiochalmers.github.io/GECKO/doc/). +💡 In the [`GECKO/tutorials`](https://github.com/SysBioChalmers/GECKO/tree/main/tutorials) folder there are examples of how GECKO can be applied to GEMs, in either of its _full_ or _light_ forms. Each `protocol.m` contains instructions on how to reconstruct and analyze an ecModel, demonstrating how different fuctions in GECKO can be used. These two scripts complement the [protocols paper](#citation). -_**Note:** Regarding code and model compatibility with earlier GECKO versions, see [Previous versions: GECKO 1 and 2](#previous-versions-gecko-1-and-2)_. +_**Note:** Regarding code and model compatibility with earlier GECKO versions, see [Previous versions: GECKO 1 and 2](https://github.com/SysBioChalmers/GECKO/wiki/Previous-versions:-GECKO-1-and-2)_. -### Citation -- A GECKO 3 publication is currently under consideration, citation information will appear here in due course. -- For GECKO release 2, please cite [Domenzain et al. (2022) doi:10.1038/s41467-022-31421-1](https://doi.org/10.1038/s41467-022-31421-1). -- For GECKO release 1, please cite [Sánchez et al. (2017) doi:10.15252/msb.20167411](https://doi.org/10.15252/msb.20167411). +## Documentation +**Installation instructions** and other information is available in the **[Wiki](https://github.com/SysBioChalmers/GECKO/wiki)**. The source code documentation is also available +[online](http://sysbiochalmers.github.io/GECKO/doc/). Use [GitHub Discussions](https://github.com/SysBioChalmers/GECKO/discussions) for support, to ask questions or leave comments. -### Getting started +## Citation -#### Required software - -- MATLAB version 2019b or later, no additional MathWorks toolboxes are required. -- [RAVEN Toolbox](https://github.com/SysBioChalmers/RAVEN). The RAVEN Toolbox Wiki contains [installation instructions for both RAVEN](https://github.com/SysBioChalmers/RAVEN/wiki/Installation) and [Gurobi](https://github.com/SysBioChalmers/RAVEN/wiki/Installation#solvers). Briefly, RAVEN is either downloaded via `git clone`, as ZIP-archive from GitHub, or installed as a [MATLAB AddOn](https://se.mathworks.com/matlabcentral/fileexchange/112330-raven-toolbox). After finishing all installation instructions, the user should run installation checks in MATLAB with: `checkInstallation`. -- [Gurobi Optimizer](https://www.gurobi.com/solutions/gurobi-optimizer/) is recommended for simulations (free academic license available). Alternatively, the open-source [GNU Linear Programming Kit](https://www.gnu.org/software/glpk/) (distributed with RAVEN) or SoPlex as part of the [SCIP Optimization Suite](https://scipopt.org/) can be used. -- [Docker](https://www.docker.com/) for running DLKcat. Installation instructions are available at https://docs.docker.com/get-docker . - -#### Installation - -- The preferred way to download GECKO is via git clone: - -``` -git clone --depth=1 https://github.com/SysBioChalmers/GECKO -``` - -- Alternatively, [a ZIP-archive can be directly downloaded from GitHub](https://github.com/SysBioChalmers/GECKO/releases). The ZIP-archive should be extracted to a disk location where the user has read- and write-access rights. - -- After `git clone` or extracting the ZIP-archive, the user should navigate in MATLAB to the GECKO folder. GECKO can then be installed with the command that adds GECKO (sub-)folders to the MATLAB path:: - -``` - cd('C:\path\to\GECKO') % Modify to match GECKO folder and operating system - GECKOInstaller.install -``` - -- If desired, a removal command is available as:: - -``` - GECKOInstaller.uninstall -``` - -All set! 🚀 - -## Previous versions: GECKO 1 and 2 - -Due to significant refactoring of the code, GECKO version 3 is largely not backwards compatible with earlier GECKO versions. - -- Most notably, GECKO 3 ecModels have an `.ec` structure containing all enzyme and kcat information. -- In addition, in GECKO 3 enzymes are incorporated in the S-matrix as MW/kcat, while in GECKO 1 and 2 this was 1/kcat (where the MW was instead considered in the protein exchange reactions). -- GECKO 3 ecModels can be stored in YAML file format that retains all model content. -- Most functions in GECKO 3 do not work on ecModels generated with GECKO versions 1 or 2. -- ecModels generated in GECKO 3 do not work with functions from GECKO versions 1 or 2. -- At this moment, there are no Python functions to work with GECKO 3 formatted ecModels. -- The last GECKO 2 release (2.0.3) is available [here](https://github.com/SysBioChalmers/GECKO/releases/tag/v2.0.3). -- The `gecko2` branch remains available for any potential fixes. +- The GECKO 3 publication is in press at Nature Protocols. Once published, citation information will appear here. ## Contributing diff --git a/doc/index.html b/doc/index.html index b73088c93..8a709fdb6 100644 --- a/doc/index.html +++ b/doc/index.html @@ -19,23 +19,22 @@

Matlab Directories

Matlab Files found in these Directories

- - - - - - - - - - - - - - - - -
ModelAdapter ecFSEOF getReactionsFromEnzyme runDLKcat
ModelAdapterManager ecFVA getStandardKcat saveEcModel
abc_max enzymeUsage getSubsetEcModel selectKcatValue
adapterTemplate fillEnzConcs getcarbonnum sensitivityTuning
addCarbonNum findECInDB getrSample setKcatForReactions
addNewRxnsToEC findGECKOroot loadBRENDAdata setProtPoolSize
anaerobicModel findMaxValue loadConventionalGEM sigmaFitter
applyComplexData findMetSmiles loadDatabases startGECKOproject
applyCustomKcats flexibilizeEnzConcs loadEcModel truncateValues
applyKcatConstraints fuzzyKcatMatching loadFluxData updateGECKOdoc
bayesianSensitivityTuning getComplexData loadProtData updateProtPool
calculateFfactor getConcControlCoeffs makeEcModel updateprior
calculateMW getECfromDatabase mapRxnsToConv writeDLKcatInput
changeMedia getECfromGEM mergeDLKcatAndFuzzyKcats
constrainEnzConcs getECstring plotEcFVA
constrainFluxData getFluxTarget readDLKcatOutput
copyECtoGEM getKcatAcrossIsozymes reportEnzymeUsage
+ ModelAdapter copyECtoGEM getKcatAcrossIsozymes readDLKcatOutput + ModelAdapterManager ecFSEOF getReactionsFromEnzyme reportEnzymeUsage + abc_max ecFVA getStandardKcat runDLKcat + adapterTemplate enzymeUsage getSubsetEcModel saveEcModel + addCarbonNum fillEnzConcs getcarbonnum selectKcatValue + addNewRxnsToEC findECInDB getrSample sensitivityTuning + anaerobicModel findGECKOroot loadBRENDAdata setKcatForReactions + applyComplexData findMaxValue loadConventionalGEM setProtPoolSize + applyCustomKcats findMetSmiles loadDatabases sigmaFitter + applyKcatConstraints flexibilizeEnzConcs loadEcModel startGECKOproject + bayesianSensitivityTuning fuzzyKcatMatching loadFluxData truncateValues + calculateFfactor getComplexData loadProtData updateGECKOdoc + calculateMW getConcControlCoeffs makeEcModel updateProtPool + changeMedia getECfromDatabase mapRxnsToConv updateprior + constrainEnzConcs getECfromGEM mergeDLKcatAndFuzzyKcats writeDLKcatInput + constrainFluxData getECstring plotEcFVA
Generated by m2html © 2005
diff --git a/doc/src/geckomat/change_model/applyKcatConstraints.html b/doc/src/geckomat/change_model/applyKcatConstraints.html index 2fb03c352..b854c2483 100644 --- a/doc/src/geckomat/change_model/applyKcatConstraints.html +++ b/doc/src/geckomat/change_model/applyKcatConstraints.html @@ -157,23 +157,25 @@

SOURCE CODE ^% Map ec-reactions to model.rxns 0097 [hasEc,~] = ismember(model.rxns,modRxns); -0098 [~,rxnIdx] = ismember(modRxns,model.rxns); -0099 hasEc = find(hasEc & updateRxns); +0098 hasEc = find(hasEc & updateRxns); +0099 [~,rxnIdx] = ismember(modRxns,model.rxns); 0100 for i = 1:numel(hasEc) 0101 % Get all isozymes per reaction 0102 ecIdx = find(rxnIdx == hasEc(i)); -0103 % Multiply enzymes with their MW (they are then automatically -0104 % summed per reaction), and divide by their kcat, to get a vector -0105 % of MW/kcat values. -0106 MWkcat = (model.ec.rxnEnzMat(ecIdx,:) * model.ec.mw) ./ model.ec.kcat(ecIdx); -0107 % If kcat was zero, MWkcat is Inf. Correct back to zero -0108 MWkcat(abs(MWkcat)==Inf)=0; -0109 % Select the lowest MW/kcat (= most efficient), and convert to /hour -0110 model.S(prot_pool_idx, hasEc(i)) = -min(MWkcat/3600); -0111 end -0112 end -0113 end -0114 +0103 % ecIdx = strcmp(model.rxns(hasEc(i)),modRxns); +0104 % Multiply enzymes with their MW (they are then automatically +0105 % summed per reaction), and divide by their kcat, to get a vector +0106 % of MW/kcat values. +0107 MWkcat = (model.ec.rxnEnzMat(ecIdx,:) * model.ec.mw) ./ model.ec.kcat(ecIdx); +0108 % If kcat was zero, MWkcat is Inf. If no enzyme info was found, +0109 % MWkcat is NaN. Correct both back to zero +0110 MWkcat(isinf(MWkcat) | isnan(MWkcat)) = 0; +0111 % Select the lowest MW/kcat (= most efficient), and convert to /hour +0112 model.S(prot_pool_idx, hasEc(i)) = -min(MWkcat/3600); +0113 end +0114 end +0115 end +0116
Generated by m2html © 2005
\ No newline at end of file diff --git a/doc/src/geckomat/change_model/makeEcModel.html b/doc/src/geckomat/change_model/makeEcModel.html index cf51d3bcb..18aeab8e1 100644 --- a/doc/src/geckomat/change_model/makeEcModel.html +++ b/doc/src/geckomat/change_model/makeEcModel.html @@ -463,7 +463,7 @@

SOURCE CODE ^%9: Add proteins as pseudometabolites 0328 if ~geckoLight -0329 [proteinMets.mets, uniprotSortId] = sort(ec.enzymes); +0329 [proteinMets.mets, uniprotSortId] = unique(ec.enzymes); 0330 proteinMets.mets = strcat('prot_',proteinMets.mets); 0331 proteinMets.metNames = proteinMets.mets; 0332 proteinMets.compartments = compartmentID; diff --git a/doc/src/geckomat/gather_kcats/runDLKcat.html b/doc/src/geckomat/gather_kcats/runDLKcat.html index b2ad3f248..1aeba1750 100644 --- a/doc/src/geckomat/gather_kcats/runDLKcat.html +++ b/doc/src/geckomat/gather_kcats/runDLKcat.html @@ -90,7 +90,7 @@

SOURCE CODE ^end 0036 0037 disp('Running DLKcat prediction, this may take many minutes, especially the first time.') -0038 status = system(['docker run --rm -v ' fullfile(params.path,'/data') ':/data ghcr.io/sysbiochalmers/dlkcat-gecko:0.1 /bin/bash -c "python DLKcat.py /data/DLKcat.tsv /data/DLKcatOutput.tsv"']); +0038 status = system(['docker run --rm -v "' fullfile(params.path,'/data') '":/data ghcr.io/sysbiochalmers/dlkcat-gecko:0.1 /bin/bash -c "python DLKcat.py /data/DLKcat.tsv /data/DLKcatOutput.tsv"']); 0039 0040 if status == 0 && exist(fullfile(params.path,'data/DLKcatOutput.tsv')) 0041 delete(fullfile(params.path,'/data/DLKcat.tsv')); diff --git a/doc/src/geckomat/get_enzyme_data/loadDatabases.html b/doc/src/geckomat/get_enzyme_data/loadDatabases.html index 69c4a25f4..80e12d51a 100644 --- a/doc/src/geckomat/get_enzyme_data/loadDatabases.html +++ b/doc/src/geckomat/get_enzyme_data/loadDatabases.html @@ -141,133 +141,142 @@

SOURCE CODE ^else 0080 databases.uniprot = []; 0081 end -0082 end -0083 -0084 %% KEGG -0085 if any(strcmp(selectDatabase,{'kegg','both'})) -0086 keggPath = fullfile(filePath,'kegg.tsv'); -0087 if ~exist(keggPath,'file') -0088 if isempty(kegg.ID) -0089 printOrange('WARNING: No kegg.ID is specified, unable to download KEGG DB.\n') -0090 else -0091 downloadKEGG(kegg.ID,keggPath,kegg.geneID); -0092 end -0093 end -0094 if exist(keggPath,'file') -0095 fid = fopen(keggPath,'r'); -0096 fileContent = textscan(fid,'%q %q %q %q %q %q %q','Delimiter',',','HeaderLines',0); -0097 fclose(fid); -0098 databases.kegg.uniprot = fileContent{1}; -0099 databases.kegg.genes = fileContent{2}; -0100 databases.kegg.keggGene = fileContent{3}; -0101 databases.kegg.eccodes = fileContent{4}; -0102 databases.kegg.MW = str2double(fileContent{5}); -0103 databases.kegg.pathway = fileContent{6}; -0104 databases.kegg.seq = fileContent{7}; -0105 else -0106 databases.kegg = []; -0107 end -0108 end -0109 end -0110 -0111 function downloadKEGG(keggID, filePath, keggGeneID) -0112 %% Download gene information -0113 progressbar(['Downloading KEGG data for organism code ' keggID]) -0114 webOptions = weboptions('Timeout',30); -0115 try -0116 gene_list = webread(['http://rest.kegg.jp/list/' keggID],webOptions); -0117 catch ME -0118 switch ME.identifier -0119 case 'MATLAB:webservices:HTTP400StatusCodeError' -0120 error(['Unable to download data form KEGG with a potentially invalid ID: ' keggID ]) -0121 end -0122 end -0123 gene_list = regexpi(gene_list, '[^\n]+','match')'; -0124 gene_id = regexpi(gene_list,['(?<=' keggID ':)\S+'],'match'); -0125 -0126 % Retrieve information for every gene in the list, 10 genes per query -0127 genesPerQuery = 10; -0128 queries = ceil(numel(gene_id)/genesPerQuery); -0129 keggData = cell(numel(gene_id),1); -0130 for i = 1:queries -0131 % Download batches of genes -0132 firstIdx = i*genesPerQuery-(genesPerQuery-1); -0133 lastIdx = i*genesPerQuery; -0134 if lastIdx > numel(gene_id) % Last query has probably less genes -0135 lastIdx = numel(gene_id); -0136 end -0137 url = ['http://rest.kegg.jp/get/' keggID ':' strjoin([gene_id{firstIdx:lastIdx}],['+' keggID ':'])]; -0138 -0139 retry = true; -0140 while retry -0141 try -0142 retry = false; -0143 out = webread(url,webOptions); -0144 catch -0145 retry = true; -0146 end -0147 end -0148 outSplit = strsplit(out,['///' 10]); %10 is new line character -0149 if numel(outSplit) < lastIdx-firstIdx+2 -0150 error('KEGG returns less genes per query') %Reduce genesPerQuery -0151 end -0152 keggData(firstIdx:lastIdx) = outSplit(1:end-1); -0153 progressbar(i/queries) -0154 end -0155 -0156 %% Parsing of info to keggDB format -0157 sequence = regexprep(keggData,'.*AASEQ\s+\d+\s+([A-Z\s])+?\s+NTSEQ.*','$1'); -0158 %No AASEQ -> no protein -> not of interest -0159 noProt = startsWith(sequence,'ENTRY '); -0160 uni = regexprep(keggData,'.*UniProt: (\S+?)\s.*','$1'); -0161 noUni = startsWith(uni,'ENTRY '); -0162 uni(noProt | noUni) = []; -0163 keggData(noProt | noUni) = []; -0164 sequence(noProt | noUni) = []; -0165 sequence = regexprep(sequence,'\s*',''); -0166 keggGene = regexprep(keggData,'ENTRY\s+(\S+?)\s.+','$1'); -0167 -0168 switch keggGeneID -0169 case {'kegg',''} -0170 gene_name = keggGene; -0171 otherwise -0172 % In case there are special characters: -0173 keggGeneIDT = regexptranslate('escape',keggGeneID); -0174 gene_name = regexprep(keggData,['.+' keggGeneIDT ': (\S+?)\n.+'],'$1'); -0175 noID = ~contains(keggData,keggGeneIDT); -0176 if all(noID) -0177 error(['None of the KEGG entries are annotated with the gene identifier ' keggGeneID]) -0178 else -0179 gene_name(noID)= []; -0180 keggData(noID) = []; -0181 keggGene(noID) = []; -0182 sequence(noID) = []; -0183 uni(noID) = []; -0184 end -0185 end -0186 -0187 EC_names = regexprep(keggData,'.*ORTHOLOGY.*\[EC:(.*?)\].*','$1'); -0188 EC_names(startsWith(EC_names,'ENTRY ')) = {''}; -0189 -0190 MW = cell(numel(sequence),1); -0191 for i=1:numel(sequence) -0192 if ~isempty(sequence{i}) -0193 MW{i} = num2str(round(calculateMW(sequence{i}))); -0194 end -0195 end -0196 -0197 pathway = regexprep(keggData,'.*PATHWAY\s+(.*?)(BRITE|MODULE).*','$1'); -0198 pathway(startsWith(pathway,'ENTRY ')) = {''}; -0199 pathway = strrep(pathway,[keggID '01100 Metabolic pathways'],''); -0200 pathway = regexprep(pathway,'\n',''); -0201 pathway = regexprep(pathway,' ',''); -0202 -0203 out = [uni, gene_name, keggGene, EC_names, MW, pathway, sequence]; -0204 out = cell2table(out); +0082 if ~isempty(databases.uniprot) +0083 [uniqueIDs,uniqueIdx] = unique(databases.uniprot.ID,'stable'); +0084 if numel(uniqueIDs) < numel(databases.uniprot.ID) +0085 duplID = setdiff(1:numel(databases.uniprot.ID),uniqueIdx); +0086 dispEM(['Duplicate entries are found for the following proteins. '... +0087 'Manually curate the ''uniprot.tsv'' file, or adjust the uniprot parameters '... +0088 'in the model adapter:'],true,databases.uniprot.ID(duplID)); +0089 end +0090 end +0091 end +0092 +0093 %% KEGG +0094 if any(strcmp(selectDatabase,{'kegg','both'})) +0095 keggPath = fullfile(filePath,'kegg.tsv'); +0096 if ~exist(keggPath,'file') +0097 if isempty(kegg.ID) +0098 printOrange('WARNING: No kegg.ID is specified, unable to download KEGG DB.\n') +0099 else +0100 downloadKEGG(kegg.ID,keggPath,kegg.geneID); +0101 end +0102 end +0103 if exist(keggPath,'file') +0104 fid = fopen(keggPath,'r'); +0105 fileContent = textscan(fid,'%q %q %q %q %q %q %q','Delimiter',',','HeaderLines',0); +0106 fclose(fid); +0107 databases.kegg.uniprot = fileContent{1}; +0108 databases.kegg.genes = fileContent{2}; +0109 databases.kegg.keggGene = fileContent{3}; +0110 databases.kegg.eccodes = fileContent{4}; +0111 databases.kegg.MW = str2double(fileContent{5}); +0112 databases.kegg.pathway = fileContent{6}; +0113 databases.kegg.seq = fileContent{7}; +0114 else +0115 databases.kegg = []; +0116 end +0117 end +0118 end +0119 +0120 function downloadKEGG(keggID, filePath, keggGeneID) +0121 %% Download gene information +0122 progressbar(['Downloading KEGG data for organism code ' keggID]) +0123 webOptions = weboptions('Timeout',30); +0124 try +0125 gene_list = webread(['http://rest.kegg.jp/list/' keggID],webOptions); +0126 catch ME +0127 switch ME.identifier +0128 case 'MATLAB:webservices:HTTP400StatusCodeError' +0129 error(['Unable to download data form KEGG with a potentially invalid ID: ' keggID ]) +0130 end +0131 end +0132 gene_list = regexpi(gene_list, '[^\n]+','match')'; +0133 gene_id = regexpi(gene_list,['(?<=' keggID ':)\S+'],'match'); +0134 +0135 % Retrieve information for every gene in the list, 10 genes per query +0136 genesPerQuery = 10; +0137 queries = ceil(numel(gene_id)/genesPerQuery); +0138 keggData = cell(numel(gene_id),1); +0139 for i = 1:queries +0140 % Download batches of genes +0141 firstIdx = i*genesPerQuery-(genesPerQuery-1); +0142 lastIdx = i*genesPerQuery; +0143 if lastIdx > numel(gene_id) % Last query has probably less genes +0144 lastIdx = numel(gene_id); +0145 end +0146 url = ['http://rest.kegg.jp/get/' keggID ':' strjoin([gene_id{firstIdx:lastIdx}],['+' keggID ':'])]; +0147 +0148 retry = true; +0149 while retry +0150 try +0151 retry = false; +0152 out = webread(url,webOptions); +0153 catch +0154 retry = true; +0155 end +0156 end +0157 outSplit = strsplit(out,['///' 10]); %10 is new line character +0158 if numel(outSplit) < lastIdx-firstIdx+2 +0159 error('KEGG returns less genes per query') %Reduce genesPerQuery +0160 end +0161 keggData(firstIdx:lastIdx) = outSplit(1:end-1); +0162 progressbar(i/queries) +0163 end +0164 +0165 %% Parsing of info to keggDB format +0166 sequence = regexprep(keggData,'.*AASEQ\s+\d+\s+([A-Z\s])+?\s+NTSEQ.*','$1'); +0167 %No AASEQ -> no protein -> not of interest +0168 noProt = startsWith(sequence,'ENTRY '); +0169 uni = regexprep(keggData,'.*UniProt: (\S+?)\s.*','$1'); +0170 noUni = startsWith(uni,'ENTRY '); +0171 uni(noProt | noUni) = []; +0172 keggData(noProt | noUni) = []; +0173 sequence(noProt | noUni) = []; +0174 sequence = regexprep(sequence,'\s*',''); +0175 keggGene = regexprep(keggData,'ENTRY\s+(\S+?)\s.+','$1'); +0176 +0177 switch keggGeneID +0178 case {'kegg',''} +0179 gene_name = keggGene; +0180 otherwise +0181 % In case there are special characters: +0182 keggGeneIDT = regexptranslate('escape',keggGeneID); +0183 gene_name = regexprep(keggData,['.+' keggGeneIDT ': (\S+?)\n.+'],'$1'); +0184 noID = ~contains(keggData,keggGeneIDT); +0185 if all(noID) +0186 error(['None of the KEGG entries are annotated with the gene identifier ' keggGeneID]) +0187 else +0188 gene_name(noID)= []; +0189 keggData(noID) = []; +0190 keggGene(noID) = []; +0191 sequence(noID) = []; +0192 uni(noID) = []; +0193 end +0194 end +0195 +0196 EC_names = regexprep(keggData,'.*ORTHOLOGY.*\[EC:(.*?)\].*','$1'); +0197 EC_names(startsWith(EC_names,'ENTRY ')) = {''}; +0198 +0199 MW = cell(numel(sequence),1); +0200 for i=1:numel(sequence) +0201 if ~isempty(sequence{i}) +0202 MW{i} = num2str(round(calculateMW(sequence{i}))); +0203 end +0204 end 0205 -0206 writetable(out, filePath, 'FileType', 'text', 'WriteVariableNames',false); -0207 fprintf('Model-specific KEGG database stored at %s\n',filePath); -0208 end +0206 pathway = regexprep(keggData,'.*PATHWAY\s+(.*?)(BRITE|MODULE).*','$1'); +0207 pathway(startsWith(pathway,'ENTRY ')) = {''}; +0208 pathway = strrep(pathway,[keggID '01100 Metabolic pathways'],''); +0209 pathway = regexprep(pathway,'\n',''); +0210 pathway = regexprep(pathway,' ',''); +0211 +0212 out = [uni, gene_name, keggGene, EC_names, MW, pathway, sequence]; +0213 out = cell2table(out); +0214 +0215 writetable(out, filePath, 'FileType', 'text', 'WriteVariableNames',false); +0216 fprintf('Model-specific KEGG database stored at %s\n',filePath); +0217 end
Generated by m2html © 2005
\ No newline at end of file diff --git a/doc/src/geckomat/limit_proteins/calculateFfactor.html b/doc/src/geckomat/limit_proteins/calculateFfactor.html index 4264df9fe..495bb5d3f 100644 --- a/doc/src/geckomat/limit_proteins/calculateFfactor.html +++ b/doc/src/geckomat/limit_proteins/calculateFfactor.html @@ -114,7 +114,7 @@

SOURCE CODE ^'%s %s %f','delimiter','\t','HeaderLines',headerLines); 0055 genes = fileContent{2}; 0056 %Remove internal geneIDs modifiers -0057 genes = regexprep(genes,'(\d{4}).',''); +0057 genes = regexprep(genes,'^\d+\.',''); 0058 level = fileContent{3}; 0059 fclose(fID); 0060 [a,b] = ismember(genes,uniprotDB.genes); diff --git a/doc/src/geckomat/utilities/ecFSEOF.html b/doc/src/geckomat/utilities/ecFSEOF.html index eb327cf71..c6aaab129 100644 --- a/doc/src/geckomat/utilities/ecFSEOF.html +++ b/doc/src/geckomat/utilities/ecFSEOF.html @@ -24,7 +24,7 @@

PURPOSE ^ecFSEOF

SYNOPSIS ^

-
function FC = ecFSEOF(ecModel,targetRxn,csRxn,alphaLims,nSteps,filePath,filterG,modelAdapter)
+
function fseof = ecFSEOF(model,prodTargetRxn,csRxn,nSteps,outputFile,filePath,modelAdapter)

DESCRIPTION ^

 ecFSEOF
@@ -32,52 +32,52 @@ 

DESCRIPTION ^CROSS-REFERENCE INFORMATION ^

This function calls: + This function is called by:
@@ -86,54 +86,54 @@

CROSS-REFERENCE INFORMATION ^
 
 
 <h2><a name=SOURCE CODE ^

-
0001 function FC = ecFSEOF(ecModel,targetRxn,csRxn,alphaLims,nSteps,filePath,filterG,modelAdapter)
+
0001 function fseof = ecFSEOF(model,prodTargetRxn,csRxn,nSteps,outputFile,filePath,modelAdapter)
 0002 % ecFSEOF
 0003 %   Function that runs Flux-Scanning with Enforced Objective Function (FSEOF)
 0004 %   for a specified production target.
 0005 %
 0006 % Input:
-0007 %   ecModel         an ecModel in GECKO 3 format (with ecModel.ec structure).
-0008 %   targetRxn       rxn ID for the production target reaction, a exchange
-0009 %                   reaction is recommended.
-0010 %   csRxn           rxn ID for the main carbon source uptake reaction.
-0011 %   alphaLims       vector of Minimum and maximum biomass scalling factors for
-0012 %                   enforced objective limits (e.g. [0.5 0.75]). Max value: 1.
-0013 %                   Max value recomended 0.9 (Optional, default [0.5 0.75])
-0014 %   nSteps          number of steps for suboptimal objective in FSEOF.
-0015 %                   (Optional, default 16)
-0016 %   filePath        file path for results output. It will store two files:
-0017 %                   - at the genes level, ecFSEOF_genes.tsv
-0018 %                   - at the reactions level, ecFSEOF_rxns.tsv
-0019 %                   (Optional, default in the 'output' sub-folder taken from
-0020 %                   modelAdapter, e.g. output/ecFSEOF_rxns.tsv)
-0021 %   filterG         logical value. TRUE if genes K_scores results should be
-0022 %                   filtered according to the alpha vector distribution
-0023 %                   (Optional, default true)
-0024 %   modelAdapter    a loaded model adapter. (Optional, will otherwise use
-0025 %                   the default model adapter)
-0026 %
-0027 % Output:
-0028 %   FC           structure with all results. Contains the following fields:
-0029 %                - flux_WT:   flux distribution for the WT strain (100% of
-0030 %                             carbon towards growth).
-0031 %                - alpha:     scalling factors used for enforced objetive
-0032 %                             limits
-0033 %                - v_matrix:  fluxes for each reaction in rxnsTable.rxns
-0034 %                             and each alpha.
-0035 %                - k_matrix:  fold changes for each reaction in
-0036 %                             rxnsTable.rxns and each alpha.
-0037 %                - rxnsTable: a list with all reactions with fluxes that
-0038 %                             change consistently as target production
-0039 %                             increases.
-0040 %                             Contains: ID, name, k_score, gene rule, equation
-0041 %                - geneTable: a list with all selected targets that
-0042 %                             increase production.
-0043 %                             Contains: gene, shortName, k_score
+0007 %   model          an ecModel in GECKO 3 format (with ecModel.ec structure).
+0008 %   prodTargetRxn  rxn ID for the production target reaction, a exchange
+0009 %                  reaction is recommended.
+0010 %   csRxn          rxn ID for the main carbon source uptake reaction.
+0011 %   nSteps         number of steps for suboptimal objective in FSEOF.
+0012 %                  (Optional, default 16)
+0013 %   outputFile     bolean option to save results in a file. (Optional,
+0014 %                  default false)
+0015 %   filePath       file path for results output. It will store two files:
+0016 %                  - at the genes level, ecFSEOF_genes.tsv
+0017 %                  - at the reactions level, ecFSEOF_rxns.tsv
+0018 %                  (Optional, default in the 'output' sub-folder taken from
+0019 %                  modelAdapter, e.g. output/ecFSEOF_rxns.tsv)
+0020 %   modelAdapter   a loaded model adapter. (Optional, will otherwise use
+0021 %                  the default model adapter)
+0022 %
+0023 % Output:
+0024 %   fseof   an structure with all results. Contains the following fields:
+0025 %           - alpha:             target production used for enforced
+0026 %                                objetive limits (from minimum to maximum
+0027 %                                production)
+0028 %           - v_matrix:          fluxes for each target reaction predicted
+0029 %                                and each alpha.
+0030 %           - rxnTargets:        a list with all reactions with fluxes that
+0031 %                                change consistently as target production
+0032 %                                increases.
+0033 %                                Contains: ID, name, slope, gene rule, and
+0034 %                                equation
+0035 %           - transportTargets:  a list with all transport reactions with
+0036 %                                fluxes that change consistently as target
+0037 %                                production increases.
+0038 %                                Contains: ID, name, slope, gene rule, and
+0039 %                                equation
+0040 %           - geneTargets:       a list with all selected targets that
+0041 %                                increase production.
+0042 %                                Contains: gene, shortName, slope, action,
+0043 %                                and essentiality
 0044 %
 0045 % Usage:
-0046 %   FC = ecFSEOF(ecModel,targetRxn,csRxn,alphaLims,nSteps,filePath,filterG,modelAdapter)
+0046 %   fseof = ecFSEOF(model,prodTargetRxn,csRxn,nSteps,outputFile,filePath,filterG,modelAdapter)
 0047 
-0048 if nargin < 8 || isempty(modelAdapter)
+0048 if nargin < 7 || isempty(modelAdapter)
 0049     modelAdapter = ModelAdapterManager.getDefault();
 0050     if isempty(modelAdapter)
 0051         error('Either send in a modelAdapter or set the default model adapter in the ModelAdapterManager.')
@@ -141,199 +141,270 @@ 

SOURCE CODE ^end 0054 params = modelAdapter.getParameters(); 0055 -0056 if nargin < 7 || isempty(filterG) -0057 filterG = true; +0056 if nargin < 5 || isempty(outputFile) +0057 outputFile = false; 0058 end 0059 0060 if nargin < 6 || isempty(filePath) 0061 filePath = fullfile(params.path,'output'); 0062 end 0063 -0064 if nargin < 5 || isempty(nSteps) +0064 if nargin < 4 || isempty(nSteps) 0065 nSteps = 16; 0066 end 0067 -0068 if nargin < 4 || isempty(alphaLims) -0069 alphaLims = [0.5 0.75]; -0070 end -0071 -0072 if numel(alphaLims) ~= 2 -0073 error('alphaLims parameter should be a vector of two values') -0074 end -0075 -0076 % Standardize grRules and rxnGeneMat in model -0077 [grRules,rxnGeneMat] = standardizeGrRules(ecModel,true); -0078 ecModel.grRules = grRules; -0079 ecModel.rxnGeneMat = rxnGeneMat; -0080 -0081 % Check carbon source uptake rate and LB -0082 ecModel = setParam(ecModel, 'obj', params.bioRxn, 1); -0083 sol = solveLP(ecModel, 1); -0084 csRxnIdx = strcmpi(ecModel.rxns,csRxn); +0068 % Get relevant rxn indexes +0069 prodTargetRxnIdx = getIndexes(model, prodTargetRxn,'rxns'); +0070 csRxnIdx = getIndexes(model, csRxn,'rxns'); +0071 bioRxnIdx = getIndexes(model, params.bioRxn,'rxns'); +0072 +0073 % Standardize grRules and rxnGeneMat in model +0074 [grRules,rxnGeneMat] = standardizeGrRules(model,true); +0075 model.grRules = grRules; +0076 model.rxnGeneMat = rxnGeneMat; +0077 +0078 % Check carbon source uptake rate and LB defined +0079 model = setParam(model, 'obj', params.bioRxn, 1); +0080 sol = solveLP(model); +0081 if model.lb(csRxnIdx) < sol.x(csRxnIdx) +0082 printOrange(['WARNING: Carbon source lower bound was set to ' num2str(model.lb(csRxnIdx)) ... +0083 ', but the uptake rate after model optimization is ' num2str(sol.x(csRxnIdx)) '.\n']) +0084 end 0085 -0086 if sol.x(csRxnIdx) < ecModel.lb(csRxnIdx) -0087 printOrange('WARNING: Carbon source LB and uptake rate are not equal.') -0088 end -0089 -0090 % run FSEOF analysis -0091 -0092 % Define alpha vector for suboptimal enforced objective values and -0093 % initialize fluxes and K_scores matrices -0094 alpha = alphaLims(1):((alphaLims(2)-alphaLims(1))/(nSteps-1)):alphaLims(2); -0095 v_matrix = zeros(length(ecModel.rxns),length(alpha)); -0096 k_matrix = zeros(length(ecModel.rxns),length(alpha)); -0097 enzRxns = contains(ecModel.rxns,'usage_prot_'); -0098 tolerance = 1e-4; -0099 -0100 % Simulate WT (100% growth) and forced (X% growth and the rest towards product): -0101 flux_WT = getFluxTarget(ecModel,params.bioRxn,targetRxn,1); +0086 % run FSEOF analysis +0087 +0088 % Find out the initial target production. +0089 iniTarget = sol.x(prodTargetRxnIdx); +0090 +0091 % Find out the maximum theoretical yield of target reaction. +0092 model = setParam(model,'obj',prodTargetRxn,1); +0093 sol = solveLP(model,1); +0094 % Set to 90%% based on https://doi.org/10.1128/AEM.00115-10 +0095 maxTarget = sol.x(prodTargetRxnIdx) * 0.9; +0096 +0097 % Define alpha vector for suboptimal enforced objective values between +0098 % minimal production and 90%% of the maximum theoretical yield, initialize +0099 % fluxes matriz +0100 alpha = iniTarget:((maxTarget-iniTarget)/(nSteps-1)):maxTarget; +0101 v_matrix = zeros(length(model.rxns),length(alpha)); 0102 -0103 % Set values which are under solver tolerance -0104 flux_WT(flux_WT < 0 & ~enzRxns) = 0; -0105 flux_WT(flux_WT > 0 & enzRxns) = 0; -0106 -0107 progressbar('Flux Scanning with Enforced Objective Function') -0108 for i = 1:length(alpha) -0109 flux_MAX = getFluxTarget(ecModel,params.bioRxn,targetRxn,alpha(i)); -0110 % Set values which are under solver tolerance -0111 flux_MAX(flux_MAX < 0 & ~enzRxns) = 0; -0112 flux_MAX(flux_MAX > 0 & enzRxns) = 0; -0113 v_matrix(:,i) = flux_MAX; -0114 k_matrix(:,i) = flux_MAX./flux_WT; -0115 progressbar(i/length(alpha)) -0116 end -0117 progressbar(1) % Make sure it closes -0118 -0119 % take out rxns with no grRule and standard gene -0120 withGR = ~cellfun(@isempty,ecModel.grRules); -0121 stdPos = contains(ecModel.grRules,'standard'); -0122 withGR(stdPos) = 0; -0123 rxns = ecModel.rxns(withGR); -0124 v_matrix = v_matrix(withGR,:); -0125 k_matrix = k_matrix(withGR,:); -0126 rxnGeneM = ecModel.rxnGeneMat(withGR,:); -0127 -0128 % Filter out rxns that are always zero -> k=0/0=NaN: -0129 non_nan = sum(~isnan(k_matrix),2) > 0; -0130 rxns = rxns(non_nan,:); -0131 v_matrix = v_matrix(non_nan,:); -0132 k_matrix = k_matrix(non_nan,:); -0133 rxnGeneM = rxnGeneM(non_nan,:); +0103 % Enforce objective flux iteratively +0104 progressbar('Flux Scanning with Enforced Objective Function') +0105 for i = 1:nSteps +0106 % Enforce the objetive flux of product formation +0107 model = setParam(model,'eq',prodTargetRxnIdx,alpha(i)); +0108 +0109 % Restore minimum biomass lb to zero and set it as objective +0110 model.lb(bioRxnIdx) = 0; +0111 model = setParam(model,'obj',params.bioRxn,1); +0112 sol = solveLP(model,1); +0113 +0114 % Store flux distribution +0115 v_matrix(:,i) = sol.x; +0116 +0117 progressbar(i/nSteps) +0118 end +0119 progressbar(1) % Make sure it closes +0120 +0121 % Take out rxns with no grRule and standard gene +0122 withGR = ~cellfun(@isempty,model.grRules); +0123 stdIdx = contains(model.grRules,'standard'); +0124 withGR(stdIdx) = 0; +0125 target_rxns = model.rxns(withGR); +0126 v_matrix = v_matrix(withGR,:); +0127 rxnGeneM = model.rxnGeneMat(withGR,:); +0128 +0129 % Filter out rxns that are always zero +0130 zero_flux = ~all(abs(v_matrix(:,1:nSteps))==0,2); +0131 target_rxns = target_rxns(zero_flux,:); +0132 v_matrix = v_matrix(zero_flux,:); +0133 rxnGeneM = rxnGeneM(zero_flux,:); 0134 -0135 % Replace remaining NaNs with 1s: -0136 k_matrix(isnan(k_matrix)) = 1; -0137 -0138 % Replace any Inf value with 1000 (maximum value is ~700): -0139 k_matrix(k_matrix > 1000) = 1000; -0140 % k_matrix(k_matrix < -1000) = 2; -0141 -0142 % Filter out values that are inconsistent at different alphas: -0143 always_down = sum(k_matrix <= 1,2) == length(alpha); -0144 always_up = sum(k_matrix >= 1,2) == length(alpha); -0145 -0146 % Identify those reactions with mixed patterns -0147 incons_rxns = always_down + always_up == 0; -0148 -0149 % Identify genes that are linked to "inconsistent rxns" -0150 incons_genes = sum(rxnGeneM(incons_rxns,:),1) > 0; -0151 -0152 % Finally, inconsistent reactions are those that are not conected -0153 % to "inconsistent genes" from the original "inconsistent rxns" set -0154 incons_rxns = sum(rxnGeneM(:,incons_genes),2) > 0; -0155 -0156 % Keep results for the consistent rxns exclusively -0157 rxns = rxns(~incons_rxns,:); -0158 v_matrix = v_matrix(~incons_rxns,:); -0159 k_matrix = k_matrix(~incons_rxns,:); -0160 rxnGeneM = rxnGeneM(~incons_rxns,:); -0161 -0162 % Get median k-score across steps and order from highest to lowest -0163 k_rxns = mean(k_matrix,2); -0164 [~,order] = sort(k_rxns,'descend'); -0165 rxns = rxns(order,:); -0166 v_matrix = v_matrix(order,:); -0167 k_matrix = k_matrix(order,:); -0168 rxnGeneM = rxnGeneM(order,:); -0169 k_rxns = k_rxns(order,:); -0170 -0171 % Create list of remaining genes and filter out any inconsistent score: -0172 % Just those genes that are connected to the remaining rxns are -0173 genes = ecModel.genes(sum(rxnGeneM,1) > 0); -0174 k_genes = zeros(size(genes)); -0175 cons_genes = false(size(genes)); -0176 rxnGeneM = rxnGeneM(:,sum(rxnGeneM,1) > 0); -0177 -0178 for i = 1:length(genes) -0179 % Extract all the K_scores (from rxns across alphas) conected to -0180 % each remaining gene -0181 k_set = k_rxns(rxnGeneM(:,i) > 0); -0182 % Check the kind of control that gene i-th exerts over its reactions -0183 always_down = sum(k_set <= (1-tolerance)) == length(k_set); -0184 always_up = sum(k_set >= (1+tolerance)) == length(k_set); -0185 % Evaluate if gene is always exerting either a positive or negative -0186 % control -0187 cons_genes(i) = always_down + always_up == 1; -0188 k_genes(i) = mean(k_set); -0189 end -0190 -0191 % Keep "consistent genes" -0192 genes = genes(cons_genes); -0193 k_genes = k_genes(cons_genes); -0194 rxnGeneM = rxnGeneM(:,cons_genes); +0135 % Identify those rxns that always increase or decrease, and calculate the +0136 % slope as the difference in the flux when enforce objetive target +0137 % production is set to 90%% of the maximum teorethical yield +0138 % << v_matrix(i,nSteps-1) >> and the flux when the enforce objetive target +0139 % production is set to the minimum << v_matrix(i,1) >> for the reaction i, +0140 % divided by maxTarget-maxTarget/nSteps-1. +0141 slope_rxns = zeros(size(target_rxns)); +0142 targets = logical(size(target_rxns)); +0143 target_type = cell(size(target_rxns)); +0144 for i = 1:length(target_rxns) +0145 if issorted(abs(v_matrix(i,1:nSteps)),'strictascend') +0146 % Those reactions that always increase while enforcing target +0147 % production are suggested for Over Expression +0148 targets(i) = true; +0149 slope_rxns(i) = abs(v_matrix(i,nSteps-1)-v_matrix(i,1))/abs(maxTarget-maxTarget/nSteps-1); +0150 target_type(i) = {'OE'}; +0151 elseif issorted(abs(v_matrix(i,1:nSteps)),'strictdescend') +0152 % Those reactions that always decrease while enforcing target +0153 % production are suggested for KnockDown or KnockOut. KO are those +0154 % reactions which have zero flux when enforcing target production +0155 % to 90%% of the maximum theoretical yield. +0156 targets(i) = true; +0157 slope_rxns(i) = abs(v_matrix(i,nSteps-1)-v_matrix(i,1))/abs(maxTarget-maxTarget/nSteps-1); +0158 if v_matrix(i,nSteps) == 0 +0159 target_type(i) = {'KO'}; +0160 else +0161 target_type(i) = {'KD'}; +0162 end +0163 end +0164 end +0165 +0166 % Only keep those reaction that shows an increase or decrease pattern. +0167 target_rxns = target_rxns(targets); +0168 v_matrix = v_matrix(targets,:); +0169 rxnGeneM = rxnGeneM(targets,:); +0170 slope_rxns = slope_rxns(targets); +0171 target_type = target_type(targets); +0172 +0173 % Order from highest to lowest slope +0174 [~,order] = sort(slope_rxns,'descend'); +0175 target_rxns = target_rxns(order); +0176 v_matrix = v_matrix(order,:); +0177 rxnGeneM = rxnGeneM(order,:); +0178 slope_rxns = slope_rxns(order); +0179 target_type = target_type(order); +0180 +0181 % Filter out reactions with slope < quantile +0182 non_zero_slope = slope_rxns > quantile(slope_rxns,0.75); +0183 target_rxns = target_rxns(non_zero_slope); +0184 v_matrix = v_matrix(non_zero_slope,:); +0185 rxnGeneM = rxnGeneM(non_zero_slope,:); +0186 slope_rxns = slope_rxns(non_zero_slope); +0187 target_type = target_type(non_zero_slope); +0188 +0189 % Create gene list of those connected to the remaining rxns +0190 genes = model.genes(sum(rxnGeneM,1) > 0); +0191 slope_genes = zeros(size(genes)); +0192 rxnGeneM = rxnGeneM(:,sum(rxnGeneM,1) > 0); +0193 target_type_genes = cell(size(genes)); +0194 essentiality = cell(size(genes)); 0195 -0196 if filterG -0197 % Filter any value between mean(alpha) and 1: -0198 unchanged = (k_genes >= mean(alpha) - tolerance) + (k_genes <= 1 + tolerance) == 2; -0199 genes = genes(~unchanged); -0200 k_genes = k_genes(~unchanged); -0201 rxnGeneM = rxnGeneM(:,~unchanged); -0202 % Update results for rxns-related fields (remove remaining reactions -0203 % without any associated gene in rxnGeneM) -0204 toKeep = (sum(rxnGeneM,2) > 0); -0205 rxns = rxns(toKeep,:); -0206 v_matrix = v_matrix(toKeep,:); -0207 k_matrix = k_matrix(toKeep,:); -0208 k_rxns = k_rxns(toKeep,:); -0209 end -0210 -0211 % Order genes from highest to lowest k: -0212 [~,order] = sort(k_genes,'descend'); -0213 genes = genes(order,:); -0214 k_genes = k_genes(order,:); -0215 -0216 % Create output (exclude enzyme usage reactions): -0217 toKeep = ~startsWith(rxns,'usage_prot_'); -0218 rxnIdx = getIndexes(ecModel,rxns(toKeep),'rxns'); -0219 geneIdx = cellfun(@(x) find(strcmpi(ecModel.genes,x)),genes); -0220 FC.flux_WT = flux_WT; -0221 FC.alpha = alpha; -0222 FC.v_matrix = v_matrix(toKeep,:); -0223 FC.k_matrix = k_matrix(toKeep,:); -0224 FC.rxnsTable(:,1) = ecModel.rxns(rxnIdx); -0225 FC.rxnsTable(:,2) = ecModel.rxnNames(rxnIdx); -0226 FC.rxnsTable(:,3) = num2cell(k_rxns(toKeep)); -0227 FC.rxnsTable(:,4) = ecModel.grRules(rxnIdx); -0228 FC.rxnsTable(:,5) = constructEquations(ecModel,rxnIdx); -0229 FC.geneTable(:,1) = genes; -0230 FC.geneTable(:,2) = ecModel.geneShortNames(geneIdx); -0231 FC.geneTable(:,3) = num2cell(k_genes); -0232 -0233 writetable(cell2table(FC.geneTable, ... -0234 'VariableNames', {'gene_IDs' 'gene_names' 'K_score'}), ... -0235 fullfile(filePath, 'ecFSEOF_genes.tsv'), ... -0236 'FileType', 'text', ... -0237 'Delimiter', '\t', ... -0238 'QuoteStrings', false); -0239 -0240 writetable(cell2table(FC.rxnsTable, ... -0241 'VariableNames', {'rxn_IDs' 'rxnNames' 'K_score' 'grRules' 'rxn_formula'}), ... -0242 fullfile(filePath, 'ecFSEOF_rxns.tsv'), ... -0243 'FileType', 'text', ... -0244 'Delimiter', '\t', ... -0245 'QuoteStrings', false); -0246 -0247 disp(['ecFSEOF results stored at: ' newline filePath]); -0248 end

+0196 % Validate for gene essentiality +0197 progressbar('Checking for gene essentiality') +0198 for i = 1:numel(genes) +0199 +0200 % Block protein usage to KO al the reactions associated to it. +0201 usage_rxn_idx = strcmpi(model.ec.genes, genes{i}); +0202 usage_rxn = strcat('usage_prot_', model.ec.enzymes(usage_rxn_idx)); +0203 tempModel = setParam(model, 'eq', usage_rxn, 0); +0204 solKO = solveLP(tempModel); +0205 % Check if no feasible solution was found +0206 if solKO.stat == -1 || solKO.x(bioRxnIdx) < 1e-8 +0207 essentiality(i) = {'essential'}; +0208 end +0209 +0210 % Since a gene can be involved in multiple reactions, multiple +0211 % engineering manipulations can be suggested for the same gene +0212 % e.g. (OE and KD). So, reconcilie them, and report only one. +0213 +0214 % Get reaction index, in targets set, for each gene +0215 rxns_for_gene = find(rxnGeneM(:,i) > 0); +0216 actions = unique(target_type(rxns_for_gene)); +0217 if numel(actions) > 1 +0218 % If any OE is suggested give the highest priority over KD or KO +0219 if any(startsWith(actions, 'OE')) +0220 target_type_genes(i) = {'OE'}; +0221 % Otherwise change to KO if not essential +0222 elseif ~any(startsWith(actions, 'OE')) && ~isequal(essentiality(i),{'essential'}) +0223 target_type_genes(i) = {'KO'}; +0224 else +0225 target_type_genes(i) = {'KD'}; +0226 end +0227 else +0228 target_type_genes(i) = actions; +0229 end +0230 % Extract all the slope (from rxns across alphas) conected to +0231 % each remaining gene +0232 slope_set = slope_rxns(rxns_for_gene); +0233 slope_genes(i) = mean(slope_set); +0234 +0235 progressbar(i/numel(genes)) +0236 end +0237 progressbar(1) % Make sure it closes +0238 +0239 % Order genes from highest to lowest slope: +0240 [~,order] = sort(slope_genes,'descend'); +0241 genes = genes(order,:); +0242 slope_genes = slope_genes(order,:); +0243 target_type_genes = target_type_genes(order,:); +0244 essentiality = essentiality(order,:); +0245 +0246 % Create output: +0247 +0248 % Report values used to enforce objective target production +0249 fseof.alpha = alpha; +0250 +0251 % Exclude enzyme usage reactions +0252 toKeep = ~startsWith(target_rxns,'usage_prot_'); +0253 rxnIdx = getIndexes(model,target_rxns(toKeep),'rxns'); +0254 +0255 % Report the fluxes at each alpha +0256 fseof.v_matrix = array2table(v_matrix(toKeep,:), ... +0257 'VariableNames', string(alpha), 'RowNames', model.rxns(rxnIdx)); +0258 +0259 % Report reaction targets +0260 fseof.rxnTargets(:,1) = model.rxns(rxnIdx); +0261 fseof.rxnTargets(:,2) = model.rxnNames(rxnIdx); +0262 fseof.rxnTargets(:,3) = num2cell(slope_rxns(toKeep)); +0263 fseof.rxnTargets(:,4) = model.grRules(rxnIdx); +0264 fseof.rxnTargets(:,5) = constructEquations(model,rxnIdx); +0265 +0266 % Identify transport reactions; defined as reactions involving (at least) +0267 % one metabolite name in more than one compartment. +0268 transportRxns = false(numel(rxnIdx),1); +0269 for i = 1:numel(rxnIdx) +0270 % Get the involved metabolites in each reaction +0271 mets = model.metNames(model.S(:,rxnIdx(i))~=0); +0272 % Remove enzyme metabolite +0273 mets(startsWith(mets,'prot_')) = []; +0274 % Validate if it is a transport reaciton +0275 transportRxns(i) = numel(mets) ~= numel(unique(mets)); +0276 end +0277 +0278 % Create separate table for transport reactions +0279 fseof.transportTargets = fseof.rxnTargets(transportRxns,:); +0280 fseof.rxnTargets(transportRxns,:) = []; +0281 +0282 % Report gene targets +0283 geneIdx = cellfun(@(x) find(strcmpi(model.genes,x)),genes); +0284 fseof.geneTargets(:,1) = genes; +0285 fseof.geneTargets(:,2) = model.geneShortNames(geneIdx); +0286 fseof.geneTargets(:,3) = num2cell(slope_genes); +0287 fseof.geneTargets(:,4) = target_type_genes; +0288 fseof.geneTargets(:,5) = essentiality; +0289 +0290 % Save results in a file if defined +0291 if outputFile +0292 % Write file with gene targets +0293 writetable(cell2table(fseof.geneTargets, ... +0294 'VariableNames', {'gene_IDs' 'gene_names' 'slope' 'action' 'essentiality'}), ... +0295 fullfile(filePath, 'ecFSEOF_genes.tsv'), ... +0296 'FileType', 'text', ... +0297 'Delimiter', '\t', ... +0298 'QuoteStrings', false); +0299 +0300 % Write file with rxn targets +0301 writetable(cell2table(fseof.rxnTargets, ... +0302 'VariableNames', {'rxn_IDs' 'rxnNames' 'slope' 'grRules' 'rxn_formula'}), ... +0303 fullfile(filePath, 'ecFSEOF_rxns.tsv'), ... +0304 'FileType', 'text', ... +0305 'Delimiter', '\t', ... +0306 'QuoteStrings', false); +0307 +0308 % Write file with transport targets +0309 writetable(cell2table(fseof.transportTargets, ... +0310 'VariableNames', {'rxn_IDs' 'rxnNames' 'slope' 'grRules' 'rxn_formula'}), ... +0311 fullfile(filePath, 'ecFSEOF_transporter.tsv'), ... +0312 'FileType', 'text', ... +0313 'Delimiter', '\t', ... +0314 'QuoteStrings', false); +0315 +0316 disp(['ecFSEOF results stored at: ' newline filePath]); +0317 end +0318 +0319 end

Generated by m2html © 2005
\ No newline at end of file diff --git a/doc/src/geckomat/utilities/getFluxTarget.html b/doc/src/geckomat/utilities/getFluxTarget.html deleted file mode 100644 index f790eec11..000000000 --- a/doc/src/geckomat/utilities/getFluxTarget.html +++ /dev/null @@ -1,133 +0,0 @@ - - - - Description of getFluxTarget - - - - - - - - - -
Home > src > geckomat > utilities > getFluxTarget.m
- - - -

getFluxTarget -

- -

PURPOSE ^

-
getFluxTarget
- -

SYNOPSIS ^

-
function [maxFlux, minFlux] = getFluxTarget(model,bioRxn,targetRxn,alpha)
- -

DESCRIPTION ^

-
 getFluxTarget
-
-   Function that performs a series of LP optimizations on an ecModel,
-   by first maximizing biomass, then fixing a suboptimal value and 
-   proceeding to protein pool minimization, last a minimization and
-   maximization of a given production target reaction is performed
-   subject to the optimal production level.
-
- Input:
-   model           an ecModel in GECKO 3 format (with ecModel.ec structure).
-   bioRxn          rxn ID for the biomass reaction.
-   targetRxn       rxn ID for the production target reaction, a exchange
-                   reaction is recommended.
-   alpha           scalling factor for desired suboptimal growth.
-                   (Optional, default 0.95)
-
- Output:
-   minFlux         vector of minimum flux rates at minimum target production,
-                   corresponding to model.rxns
-   maxFlux         vector of maximum flux rates at maximum target production,
-                   corresponding to model.rxns
-
- Usage:
-   [maxFlux, minFlux] = getFluxTarget(ecModel,bioRxn,targetRxn,alpha)
- - -

CROSS-REFERENCE INFORMATION ^

-This function calls: -
    -
-This function is called by: - - - - - -

SOURCE CODE ^

-
0001 function [maxFlux, minFlux] = getFluxTarget(model,bioRxn,targetRxn,alpha)
-0002 % getFluxTarget
-0003 %
-0004 %   Function that performs a series of LP optimizations on an ecModel,
-0005 %   by first maximizing biomass, then fixing a suboptimal value and
-0006 %   proceeding to protein pool minimization, last a minimization and
-0007 %   maximization of a given production target reaction is performed
-0008 %   subject to the optimal production level.
-0009 %
-0010 % Input:
-0011 %   model           an ecModel in GECKO 3 format (with ecModel.ec structure).
-0012 %   bioRxn          rxn ID for the biomass reaction.
-0013 %   targetRxn       rxn ID for the production target reaction, a exchange
-0014 %                   reaction is recommended.
-0015 %   alpha           scalling factor for desired suboptimal growth.
-0016 %                   (Optional, default 0.95)
-0017 %
-0018 % Output:
-0019 %   minFlux         vector of minimum flux rates at minimum target production,
-0020 %                   corresponding to model.rxns
-0021 %   maxFlux         vector of maximum flux rates at maximum target production,
-0022 %                   corresponding to model.rxns
-0023 %
-0024 % Usage:
-0025 %   [maxFlux, minFlux] = getFluxTarget(ecModel,bioRxn,targetRxn,alpha)
-0026 
-0027 if nargin < 4 || isempty(alpha)
-0028     alpha = 0.95;
-0029 end
-0030 
-0031 % Fix a suboptimal alpha if equal to 1
-0032 if alpha == 1
-0033     alpha = 0.99;
-0034 end
-0035 
-0036 % Get relevant rxn indexes
-0037 bioRxnIdx    = getIndexes(model, bioRxn,'rxns');
-0038 poolIdx      = strcmpi(model.rxns, 'prot_pool_exchange');
-0039 
-0040 % Maximize growth and fix carbon source and suboptimal growth
-0041 model = setParam(model, 'obj', bioRxn, 1);
-0042 sol   = solveLP(model);
-0043 model = setParam(model, 'lb', bioRxn, sol.x(bioRxnIdx) * alpha);
-0044 
-0045 % Minimize prot_pool_exchange and fix
-0046 model = setParam(model, 'obj', 'prot_pool_exchange', 1);
-0047 sol   = solveLP(model);
-0048 model = setParam(model, 'lb', 'prot_pool_exchange', sol.x(poolIdx) * 1.01);
-0049 
-0050 % If minimum fluxes are required get them
-0051 minFlux = [];
-0052 if nargout == 2
-0053     % Minimize target
-0054     model = setParam(model, 'obj', targetRxn, -1);
-0055     minSol = solveLP(model);
-0056     minFlux = minSol.x;
-0057 end
-0058 
-0059 % Maximize target
-0060 model = setParam(model, 'obj', targetRxn, 1);
-0061 maxSol = solveLP(model);
-0062 maxFlux = maxSol.x;
-0063 end
-
Generated by m2html © 2005
- - \ No newline at end of file diff --git a/doc/src/geckomat/utilities/getSubsetEcModel.html b/doc/src/geckomat/utilities/getSubsetEcModel.html index dd8e6d7c2..49359f00a 100644 --- a/doc/src/geckomat/utilities/getSubsetEcModel.html +++ b/doc/src/geckomat/utilities/getSubsetEcModel.html @@ -31,6 +31,12 @@

DESCRIPTION ^SOURCE CODE ^% mapping reactions and genes from a conventional context-specific GEM to 0005 % a general ecModel (bigEcModel), to yield a context-specific ecModel. 0006 % -0007 % Input: -0008 % bigEcModel enzyme-constrained version of conventional model of -0009 % metabolism for a given species -0010 % smallGEM Reduced model (subset of the general model) for a -0011 % specific strain (microbes) or cell-line/tissues (mammals) +0007 % Both models (bigEcModel and the smallGEM) should have derived from the +0008 % same starting model. If an ecModel with human-GEM 1.15.0 is made +0009 % (yielding the bigEcModel), then human-GEM 1.15.0 should also be the +0010 % basis from which the context-specific GEM (smallGEM) was generated by +0011 % for instance tINIT. 0012 % -0013 % Output: -0014 % smallEcModel Enzyme-constrained version of the context-specific model -0015 % -0016 % Usage: smallEcModel = getSubsetEcModel(smallGEM,bigEcModel) -0017 -0018 % Check if original bigEcModel contains context-dependent protein constraints -0019 if any(bigEcModel.lb(startsWith(bigEcModel.rxns,'usage_prot_')) ~= -1000) -0020 printOrange(['WARNING: The bigEcModel is constraint by protein concentrations that are\n' ... -0021 'likely not relevant in the constructed smallEcModel.\n']) -0022 end +0013 % Input: +0014 % bigEcModel enzyme-constrained version of conventional model of +0015 % metabolism for a given species +0016 % smallGEM Reduced model (subset of the general model) for a +0017 % specific strain (microbes) or cell-line/tissues (mammals) +0018 % +0019 % Output: +0020 % smallEcModel Enzyme-constrained version of the context-specific model +0021 % +0022 % Usage: smallEcModel = getSubsetEcModel(smallGEM,bigEcModel) 0023 -0024 % Remove genes (and associated reactions) that are absent in smallGEM -0025 genesToRemove = setdiff(bigEcModel.genes,smallGEM.genes); -0026 smallEcModel = removeGenes(bigEcModel,genesToRemove,true,true,false); -0027 -0028 % Remove genes from ec-structure -0029 enzToRemove = ismember(smallEcModel.ec.genes,genesToRemove); -0030 smallEcModel.ec.genes(enzToRemove) = []; -0031 smallEcModel.ec.enzymes(enzToRemove) = []; -0032 smallEcModel.ec.concs(enzToRemove) = []; -0033 smallEcModel.ec.mw(enzToRemove) = []; -0034 smallEcModel.ec.sequence(enzToRemove) = []; -0035 smallEcModel.ec.rxnEnzMat(:,enzToRemove) = []; -0036 -0037 % Remove any reaction (except prot_ associated) that is absent from smallGEM -0038 trimRxns = regexprep(smallEcModel.rxns,'_REV|_EXP_\d+',''); -0039 protRxns = startsWith(smallEcModel.rxns,{'usage_prot_','prot_pool_exchange'}); -0040 keepRxns = ismember(trimRxns,smallGEM.rxns) | protRxns; -0041 smallEcModel = removeReactions(smallEcModel,~keepRxns, true, true, true); -0042 -0043 % Remove reactions from ec-structure -0044 ecRxnsToRemove = ~ismember(smallEcModel.ec.rxns,smallEcModel.rxns); -0045 -0046 smallEcModel.ec.rxns(ecRxnsToRemove) = []; -0047 smallEcModel.ec.kcat(ecRxnsToRemove) = []; -0048 smallEcModel.ec.source(ecRxnsToRemove) = []; -0049 smallEcModel.ec.notes(ecRxnsToRemove) = []; -0050 smallEcModel.ec.eccodes(ecRxnsToRemove) = []; -0051 smallEcModel.ec.rxnEnzMat(ecRxnsToRemove,:) = []; -0052 end -0053

+0024 % Check if models were derived from the same starting GEM +0025 rxnsDiff = find(~ismember(smallGEM.rxns,bigEcModel.rxns)); +0026 if numel(rxnsDiff) > 0 +0027 dispEM(['While both models should have derived from the same starting ',... +0028 'GEM, the following reactions from smallGEM could not be found ',... +0029 'in bigEcModel:'],true,smallGEM.rxns(rxnsDiff),false); +0030 end +0031 +0032 % Check if original bigEcModel contains context-dependent protein constraints +0033 if any(bigEcModel.lb(startsWith(bigEcModel.rxns,'usage_prot_')) ~= -1000) +0034 printOrange(['WARNING: The bigEcModel is constraint by protein concentrations that are\n' ... +0035 'likely not relevant in the constructed smallEcModel.\n']) +0036 end +0037 +0038 % Remove genes (and associated reactions) that are absent in smallGEM +0039 genesToRemove = setdiff(bigEcModel.genes,smallGEM.genes); +0040 smallEcModel = removeGenes(bigEcModel,genesToRemove,true,true,false); +0041 +0042 % Remove genes from ec-structure +0043 enzToRemove = ismember(smallEcModel.ec.genes,genesToRemove); +0044 smallEcModel.ec.genes(enzToRemove) = []; +0045 smallEcModel.ec.enzymes(enzToRemove) = []; +0046 smallEcModel.ec.concs(enzToRemove) = []; +0047 smallEcModel.ec.mw(enzToRemove) = []; +0048 smallEcModel.ec.sequence(enzToRemove) = []; +0049 smallEcModel.ec.rxnEnzMat(:,enzToRemove) = []; +0050 +0051 % Remove any reaction (except prot_ associated) that is absent from smallGEM +0052 trimRxns = regexprep(smallEcModel.rxns,'_REV|_EXP_\d+',''); +0053 protRxns = startsWith(smallEcModel.rxns,{'usage_prot_','prot_pool_exchange'}); +0054 keepRxns = ismember(trimRxns,smallGEM.rxns) | protRxns; +0055 smallEcModel = removeReactions(smallEcModel,~keepRxns, true, true, true); +0056 +0057 % Remove reactions from ec-structure +0058 ecRxnsToRemove = ~ismember(smallEcModel.ec.rxns,smallEcModel.rxns); +0059 +0060 smallEcModel.ec.rxns(ecRxnsToRemove) = []; +0061 smallEcModel.ec.kcat(ecRxnsToRemove) = []; +0062 smallEcModel.ec.source(ecRxnsToRemove) = []; +0063 smallEcModel.ec.notes(ecRxnsToRemove) = []; +0064 smallEcModel.ec.eccodes(ecRxnsToRemove) = []; +0065 smallEcModel.ec.rxnEnzMat(ecRxnsToRemove,:) = []; +0066 end +0067
Generated by m2html © 2005
\ No newline at end of file diff --git a/doc/src/geckomat/utilities/index.html b/doc/src/geckomat/utilities/index.html index 49d98ee11..fa531e367 100644 --- a/doc/src/geckomat/utilities/index.html +++ b/doc/src/geckomat/utilities/index.html @@ -19,7 +19,7 @@

Index for src\geckomat\utilities

Matlab files in this directory:

-
 addNewRxnsToECaddNewRxnsToEC
 ecFSEOFecFSEOF
 ecFVAecFVA
 enzymeUsageenzymeUsage
 findGECKOrootfindGECKOroot
 getFluxTargetgetFluxTarget
 getSubsetEcModelgetSubset_ecModel
 loadConventionalGEMloadConventionalGEM
 loadEcModelloadEcModel
 loadFluxDataloadFluxData
 loadProtDataloadProtData
 mapRxnsToConvmapRxnsToConv
 plotEcFVAplotEcFVA
 reportEnzymeUsagereportEnzymeUsage
 saveEcModelsaveECmodel
 startGECKOprojectstartGECKOproject
 updateGECKOdocupdateGeckoDoc
+ addNewRxnsToECaddNewRxnsToEC  ecFSEOFecFSEOF  ecFVAecFVA  enzymeUsageenzymeUsage  findGECKOrootfindGECKOroot  getSubsetEcModelgetSubset_ecModel  loadConventionalGEMloadConventionalGEM  loadEcModelloadEcModel  loadFluxDataloadFluxData  loadProtDataloadProtData  mapRxnsToConvmapRxnsToConv  plotEcFVAplotEcFVA  reportEnzymeUsagereportEnzymeUsage  saveEcModelsaveECmodel  startGECKOprojectstartGECKOproject  updateGECKOdocupdateGeckoDoc diff --git a/doc/src/geckomat/utilities/startGECKOproject.html b/doc/src/geckomat/utilities/startGECKOproject.html index 2926ba10b..f51e357ec 100644 --- a/doc/src/geckomat/utilities/startGECKOproject.html +++ b/doc/src/geckomat/utilities/startGECKOproject.html @@ -113,7 +113,8 @@

SOURCE CODE ^else 0058 printOrange('WARNING: A project with the same name exits at the same location. The project was not created.\n') 0059 end -0060 end +0060 cd(fullPath) +0061 end
Generated by m2html © 2005
\ No newline at end of file diff --git a/src/geckomat/change_model/applyKcatConstraints.m b/src/geckomat/change_model/applyKcatConstraints.m index 12b9954f3..7c3594710 100644 --- a/src/geckomat/change_model/applyKcatConstraints.m +++ b/src/geckomat/change_model/applyKcatConstraints.m @@ -95,17 +95,19 @@ modRxns = extractAfter(model.ec.rxns,4); % Map ec-reactions to model.rxns [hasEc,~] = ismember(model.rxns,modRxns); - [~,rxnIdx] = ismember(modRxns,model.rxns); - hasEc = find(hasEc & updateRxns); + hasEc = find(hasEc & updateRxns); + [~,rxnIdx] = ismember(modRxns,model.rxns); for i = 1:numel(hasEc) % Get all isozymes per reaction ecIdx = find(rxnIdx == hasEc(i)); + % ecIdx = strcmp(model.rxns(hasEc(i)),modRxns); % Multiply enzymes with their MW (they are then automatically % summed per reaction), and divide by their kcat, to get a vector % of MW/kcat values. MWkcat = (model.ec.rxnEnzMat(ecIdx,:) * model.ec.mw) ./ model.ec.kcat(ecIdx); - % If kcat was zero, MWkcat is Inf. Correct back to zero - MWkcat(abs(MWkcat)==Inf)=0; + % If kcat was zero, MWkcat is Inf. If no enzyme info was found, + % MWkcat is NaN. Correct both back to zero + MWkcat(isinf(MWkcat) | isnan(MWkcat)) = 0; % Select the lowest MW/kcat (= most efficient), and convert to /hour model.S(prot_pool_idx, hasEc(i)) = -min(MWkcat/3600); end diff --git a/src/geckomat/change_model/makeEcModel.m b/src/geckomat/change_model/makeEcModel.m index 1bd53fdab..5fabb4bc1 100644 --- a/src/geckomat/change_model/makeEcModel.m +++ b/src/geckomat/change_model/makeEcModel.m @@ -326,7 +326,7 @@ %9: Add proteins as pseudometabolites if ~geckoLight - [proteinMets.mets, uniprotSortId] = sort(ec.enzymes); + [proteinMets.mets, uniprotSortId] = unique(ec.enzymes); proteinMets.mets = strcat('prot_',proteinMets.mets); proteinMets.metNames = proteinMets.mets; proteinMets.compartments = compartmentID; diff --git a/src/geckomat/gather_kcats/runDLKcat.m b/src/geckomat/gather_kcats/runDLKcat.m index eabf55553..85292c9d5 100644 --- a/src/geckomat/gather_kcats/runDLKcat.m +++ b/src/geckomat/gather_kcats/runDLKcat.m @@ -35,7 +35,7 @@ function runDLKcat(modelAdapter) end disp('Running DLKcat prediction, this may take many minutes, especially the first time.') -status = system(['docker run --rm -v ' fullfile(params.path,'/data') ':/data ghcr.io/sysbiochalmers/dlkcat-gecko:0.1 /bin/bash -c "python DLKcat.py /data/DLKcat.tsv /data/DLKcatOutput.tsv"']); +status = system(['docker run --rm -v "' fullfile(params.path,'/data') '":/data ghcr.io/sysbiochalmers/dlkcat-gecko:0.1 /bin/bash -c "python DLKcat.py /data/DLKcat.tsv /data/DLKcatOutput.tsv"']); if status == 0 && exist(fullfile(params.path,'data/DLKcatOutput.tsv')) delete(fullfile(params.path,'/data/DLKcat.tsv')); diff --git a/src/geckomat/get_enzyme_data/loadDatabases.m b/src/geckomat/get_enzyme_data/loadDatabases.m index c38e852bb..d882a2447 100644 --- a/src/geckomat/get_enzyme_data/loadDatabases.m +++ b/src/geckomat/get_enzyme_data/loadDatabases.m @@ -79,6 +79,15 @@ else databases.uniprot = []; end + if ~isempty(databases.uniprot) + [uniqueIDs,uniqueIdx] = unique(databases.uniprot.ID,'stable'); + if numel(uniqueIDs) < numel(databases.uniprot.ID) + duplID = setdiff(1:numel(databases.uniprot.ID),uniqueIdx); + dispEM(['Duplicate entries are found for the following proteins. '... + 'Manually curate the ''uniprot.tsv'' file, or adjust the uniprot parameters '... + 'in the model adapter:'],true,databases.uniprot.ID(duplID)); + end + end end %% KEGG diff --git a/src/geckomat/limit_proteins/calculateFfactor.m b/src/geckomat/limit_proteins/calculateFfactor.m index 755430f4e..fdc953cdd 100644 --- a/src/geckomat/limit_proteins/calculateFfactor.m +++ b/src/geckomat/limit_proteins/calculateFfactor.m @@ -54,7 +54,7 @@ fileContent = textscan(fID,'%s %s %f','delimiter','\t','HeaderLines',headerLines); genes = fileContent{2}; %Remove internal geneIDs modifiers - genes = regexprep(genes,'(\d{4}).',''); + genes = regexprep(genes,'^\d+\.',''); level = fileContent{3}; fclose(fID); [a,b] = ismember(genes,uniprotDB.genes); diff --git a/src/geckomat/utilities/ecFSEOF.m b/src/geckomat/utilities/ecFSEOF.m index 00a70c2f8..0940c827c 100644 --- a/src/geckomat/utilities/ecFSEOF.m +++ b/src/geckomat/utilities/ecFSEOF.m @@ -1,51 +1,51 @@ -function FC = ecFSEOF(ecModel,targetRxn,csRxn,alphaLims,nSteps,filePath,filterG,modelAdapter) +function fseof = ecFSEOF(model,prodTargetRxn,csRxn,nSteps,outputFile,filePath,modelAdapter) % ecFSEOF % Function that runs Flux-Scanning with Enforced Objective Function (FSEOF) % for a specified production target. % % Input: -% ecModel an ecModel in GECKO 3 format (with ecModel.ec structure). -% targetRxn rxn ID for the production target reaction, a exchange -% reaction is recommended. -% csRxn rxn ID for the main carbon source uptake reaction. -% alphaLims vector of Minimum and maximum biomass scalling factors for -% enforced objective limits (e.g. [0.5 0.75]). Max value: 1. -% Max value recomended 0.9 (Optional, default [0.5 0.75]) -% nSteps number of steps for suboptimal objective in FSEOF. -% (Optional, default 16) -% filePath file path for results output. It will store two files: -% - at the genes level, ecFSEOF_genes.tsv -% - at the reactions level, ecFSEOF_rxns.tsv -% (Optional, default in the 'output' sub-folder taken from -% modelAdapter, e.g. output/ecFSEOF_rxns.tsv) -% filterG logical value. TRUE if genes K_scores results should be -% filtered according to the alpha vector distribution -% (Optional, default true) -% modelAdapter a loaded model adapter. (Optional, will otherwise use -% the default model adapter) +% model an ecModel in GECKO 3 format (with ecModel.ec structure). +% prodTargetRxn rxn ID for the production target reaction, a exchange +% reaction is recommended. +% csRxn rxn ID for the main carbon source uptake reaction. +% nSteps number of steps for suboptimal objective in FSEOF. +% (Optional, default 16) +% outputFile bolean option to save results in a file. (Optional, +% default false) +% filePath file path for results output. It will store two files: +% - at the genes level, ecFSEOF_genes.tsv +% - at the reactions level, ecFSEOF_rxns.tsv +% (Optional, default in the 'output' sub-folder taken from +% modelAdapter, e.g. output/ecFSEOF_rxns.tsv) +% modelAdapter a loaded model adapter. (Optional, will otherwise use +% the default model adapter) % % Output: -% FC structure with all results. Contains the following fields: -% - flux_WT: flux distribution for the WT strain (100% of -% carbon towards growth). -% - alpha: scalling factors used for enforced objetive -% limits -% - v_matrix: fluxes for each reaction in rxnsTable.rxns -% and each alpha. -% - k_matrix: fold changes for each reaction in -% rxnsTable.rxns and each alpha. -% - rxnsTable: a list with all reactions with fluxes that -% change consistently as target production -% increases. -% Contains: ID, name, k_score, gene rule, equation -% - geneTable: a list with all selected targets that -% increase production. -% Contains: gene, shortName, k_score +% fseof an structure with all results. Contains the following fields: +% - alpha: target production used for enforced +% objetive limits (from minimum to maximum +% production) +% - v_matrix: fluxes for each target reaction predicted +% and each alpha. +% - rxnTargets: a list with all reactions with fluxes that +% change consistently as target production +% increases. +% Contains: ID, name, slope, gene rule, and +% equation +% - transportTargets: a list with all transport reactions with +% fluxes that change consistently as target +% production increases. +% Contains: ID, name, slope, gene rule, and +% equation +% - geneTargets: a list with all selected targets that +% increase production. +% Contains: gene, shortName, slope, action, +% and essentiality % % Usage: -% FC = ecFSEOF(ecModel,targetRxn,csRxn,alphaLims,nSteps,filePath,filterG,modelAdapter) +% fseof = ecFSEOF(model,prodTargetRxn,csRxn,nSteps,outputFile,filePath,filterG,modelAdapter) -if nargin < 8 || isempty(modelAdapter) +if nargin < 7 || isempty(modelAdapter) modelAdapter = ModelAdapterManager.getDefault(); if isempty(modelAdapter) error('Either send in a modelAdapter or set the default model adapter in the ModelAdapterManager.') @@ -53,196 +53,267 @@ end params = modelAdapter.getParameters(); -if nargin < 7 || isempty(filterG) - filterG = true; +if nargin < 5 || isempty(outputFile) + outputFile = false; end if nargin < 6 || isempty(filePath) filePath = fullfile(params.path,'output'); end -if nargin < 5 || isempty(nSteps) +if nargin < 4 || isempty(nSteps) nSteps = 16; end -if nargin < 4 || isempty(alphaLims) - alphaLims = [0.5 0.75]; -end - -if numel(alphaLims) ~= 2 - error('alphaLims parameter should be a vector of two values') -end +% Get relevant rxn indexes +prodTargetRxnIdx = getIndexes(model, prodTargetRxn,'rxns'); +csRxnIdx = getIndexes(model, csRxn,'rxns'); +bioRxnIdx = getIndexes(model, params.bioRxn,'rxns'); % Standardize grRules and rxnGeneMat in model -[grRules,rxnGeneMat] = standardizeGrRules(ecModel,true); -ecModel.grRules = grRules; -ecModel.rxnGeneMat = rxnGeneMat; - -% Check carbon source uptake rate and LB -ecModel = setParam(ecModel, 'obj', params.bioRxn, 1); -sol = solveLP(ecModel, 1); -csRxnIdx = strcmpi(ecModel.rxns,csRxn); - -if sol.x(csRxnIdx) < ecModel.lb(csRxnIdx) - printOrange('WARNING: Carbon source LB and uptake rate are not equal.') +[grRules,rxnGeneMat] = standardizeGrRules(model,true); +model.grRules = grRules; +model.rxnGeneMat = rxnGeneMat; + +% Check carbon source uptake rate and LB defined +model = setParam(model, 'obj', params.bioRxn, 1); +sol = solveLP(model); +if model.lb(csRxnIdx) < sol.x(csRxnIdx) + printOrange(['WARNING: Carbon source lower bound was set to ' num2str(model.lb(csRxnIdx)) ... + ', but the uptake rate after model optimization is ' num2str(sol.x(csRxnIdx)) '.\n']) end % run FSEOF analysis -% Define alpha vector for suboptimal enforced objective values and -% initialize fluxes and K_scores matrices -alpha = alphaLims(1):((alphaLims(2)-alphaLims(1))/(nSteps-1)):alphaLims(2); -v_matrix = zeros(length(ecModel.rxns),length(alpha)); -k_matrix = zeros(length(ecModel.rxns),length(alpha)); -enzRxns = contains(ecModel.rxns,'usage_prot_'); -tolerance = 1e-4; +% Find out the initial target production. +iniTarget = sol.x(prodTargetRxnIdx); -% Simulate WT (100% growth) and forced (X% growth and the rest towards product): -flux_WT = getFluxTarget(ecModel,params.bioRxn,targetRxn,1); +% Find out the maximum theoretical yield of target reaction. +model = setParam(model,'obj',prodTargetRxn,1); +sol = solveLP(model,1); +% Set to 90%% based on https://doi.org/10.1128/AEM.00115-10 +maxTarget = sol.x(prodTargetRxnIdx) * 0.9; -% Set values which are under solver tolerance -flux_WT(flux_WT < 0 & ~enzRxns) = 0; -flux_WT(flux_WT > 0 & enzRxns) = 0; +% Define alpha vector for suboptimal enforced objective values between +% minimal production and 90%% of the maximum theoretical yield, initialize +% fluxes matriz +alpha = iniTarget:((maxTarget-iniTarget)/(nSteps-1)):maxTarget; +v_matrix = zeros(length(model.rxns),length(alpha)); +% Enforce objective flux iteratively progressbar('Flux Scanning with Enforced Objective Function') -for i = 1:length(alpha) - flux_MAX = getFluxTarget(ecModel,params.bioRxn,targetRxn,alpha(i)); - % Set values which are under solver tolerance - flux_MAX(flux_MAX < 0 & ~enzRxns) = 0; - flux_MAX(flux_MAX > 0 & enzRxns) = 0; - v_matrix(:,i) = flux_MAX; - k_matrix(:,i) = flux_MAX./flux_WT; - progressbar(i/length(alpha)) +for i = 1:nSteps + % Enforce the objetive flux of product formation + model = setParam(model,'eq',prodTargetRxnIdx,alpha(i)); + + % Restore minimum biomass lb to zero and set it as objective + model.lb(bioRxnIdx) = 0; + model = setParam(model,'obj',params.bioRxn,1); + sol = solveLP(model,1); + + % Store flux distribution + v_matrix(:,i) = sol.x; + + progressbar(i/nSteps) end -progressbar(1) % Make sure it closes - -% take out rxns with no grRule and standard gene -withGR = ~cellfun(@isempty,ecModel.grRules); -stdPos = contains(ecModel.grRules,'standard'); -withGR(stdPos) = 0; -rxns = ecModel.rxns(withGR); -v_matrix = v_matrix(withGR,:); -k_matrix = k_matrix(withGR,:); -rxnGeneM = ecModel.rxnGeneMat(withGR,:); - -% Filter out rxns that are always zero -> k=0/0=NaN: -non_nan = sum(~isnan(k_matrix),2) > 0; -rxns = rxns(non_nan,:); -v_matrix = v_matrix(non_nan,:); -k_matrix = k_matrix(non_nan,:); -rxnGeneM = rxnGeneM(non_nan,:); - -% Replace remaining NaNs with 1s: -k_matrix(isnan(k_matrix)) = 1; - -% Replace any Inf value with 1000 (maximum value is ~700): -k_matrix(k_matrix > 1000) = 1000; -% k_matrix(k_matrix < -1000) = 2; - -% Filter out values that are inconsistent at different alphas: -always_down = sum(k_matrix <= 1,2) == length(alpha); -always_up = sum(k_matrix >= 1,2) == length(alpha); - -% Identify those reactions with mixed patterns -incons_rxns = always_down + always_up == 0; - -% Identify genes that are linked to "inconsistent rxns" -incons_genes = sum(rxnGeneM(incons_rxns,:),1) > 0; - -% Finally, inconsistent reactions are those that are not conected -% to "inconsistent genes" from the original "inconsistent rxns" set -incons_rxns = sum(rxnGeneM(:,incons_genes),2) > 0; - -% Keep results for the consistent rxns exclusively -rxns = rxns(~incons_rxns,:); -v_matrix = v_matrix(~incons_rxns,:); -k_matrix = k_matrix(~incons_rxns,:); -rxnGeneM = rxnGeneM(~incons_rxns,:); - -% Get median k-score across steps and order from highest to lowest -k_rxns = mean(k_matrix,2); -[~,order] = sort(k_rxns,'descend'); -rxns = rxns(order,:); -v_matrix = v_matrix(order,:); -k_matrix = k_matrix(order,:); -rxnGeneM = rxnGeneM(order,:); -k_rxns = k_rxns(order,:); - -% Create list of remaining genes and filter out any inconsistent score: -% Just those genes that are connected to the remaining rxns are -genes = ecModel.genes(sum(rxnGeneM,1) > 0); -k_genes = zeros(size(genes)); -cons_genes = false(size(genes)); -rxnGeneM = rxnGeneM(:,sum(rxnGeneM,1) > 0); - -for i = 1:length(genes) - % Extract all the K_scores (from rxns across alphas) conected to +progressbar(1) % Make sure it closes + +% Take out rxns with no grRule and standard gene +withGR = ~cellfun(@isempty,model.grRules); +stdIdx = contains(model.grRules,'standard'); +withGR(stdIdx) = 0; +target_rxns = model.rxns(withGR); +v_matrix = v_matrix(withGR,:); +rxnGeneM = model.rxnGeneMat(withGR,:); + +% Filter out rxns that are always zero +zero_flux = ~all(abs(v_matrix(:,1:nSteps))==0,2); +target_rxns = target_rxns(zero_flux,:); +v_matrix = v_matrix(zero_flux,:); +rxnGeneM = rxnGeneM(zero_flux,:); + +% Identify those rxns that always increase or decrease, and calculate the +% slope as the difference in the flux when enforce objetive target +% production is set to 90%% of the maximum teorethical yield +% << v_matrix(i,nSteps-1) >> and the flux when the enforce objetive target +% production is set to the minimum << v_matrix(i,1) >> for the reaction i, +% divided by maxTarget-maxTarget/nSteps-1. +slope_rxns = zeros(size(target_rxns)); +targets = logical(size(target_rxns)); +target_type = cell(size(target_rxns)); +for i = 1:length(target_rxns) + if issorted(abs(v_matrix(i,1:nSteps)),'strictascend') + % Those reactions that always increase while enforcing target + % production are suggested for Over Expression + targets(i) = true; + slope_rxns(i) = abs(v_matrix(i,nSteps-1)-v_matrix(i,1))/abs(maxTarget-maxTarget/nSteps-1); + target_type(i) = {'OE'}; + elseif issorted(abs(v_matrix(i,1:nSteps)),'strictdescend') + % Those reactions that always decrease while enforcing target + % production are suggested for KnockDown or KnockOut. KO are those + % reactions which have zero flux when enforcing target production + % to 90%% of the maximum theoretical yield. + targets(i) = true; + slope_rxns(i) = abs(v_matrix(i,nSteps-1)-v_matrix(i,1))/abs(maxTarget-maxTarget/nSteps-1); + if v_matrix(i,nSteps) == 0 + target_type(i) = {'KO'}; + else + target_type(i) = {'KD'}; + end + end +end + +% Only keep those reaction that shows an increase or decrease pattern. +target_rxns = target_rxns(targets); +v_matrix = v_matrix(targets,:); +rxnGeneM = rxnGeneM(targets,:); +slope_rxns = slope_rxns(targets); +target_type = target_type(targets); + +% Order from highest to lowest slope +[~,order] = sort(slope_rxns,'descend'); +target_rxns = target_rxns(order); +v_matrix = v_matrix(order,:); +rxnGeneM = rxnGeneM(order,:); +slope_rxns = slope_rxns(order); +target_type = target_type(order); + +% Filter out reactions with slope < quantile +non_zero_slope = slope_rxns > quantile(slope_rxns,0.75); +target_rxns = target_rxns(non_zero_slope); +v_matrix = v_matrix(non_zero_slope,:); +rxnGeneM = rxnGeneM(non_zero_slope,:); +slope_rxns = slope_rxns(non_zero_slope); +target_type = target_type(non_zero_slope); + +% Create gene list of those connected to the remaining rxns +genes = model.genes(sum(rxnGeneM,1) > 0); +slope_genes = zeros(size(genes)); +rxnGeneM = rxnGeneM(:,sum(rxnGeneM,1) > 0); +target_type_genes = cell(size(genes)); +essentiality = cell(size(genes)); + +% Validate for gene essentiality +progressbar('Checking for gene essentiality') +for i = 1:numel(genes) + + % Block protein usage to KO al the reactions associated to it. + usage_rxn_idx = strcmpi(model.ec.genes, genes{i}); + usage_rxn = strcat('usage_prot_', model.ec.enzymes(usage_rxn_idx)); + tempModel = setParam(model, 'eq', usage_rxn, 0); + solKO = solveLP(tempModel); + % Check if no feasible solution was found + if solKO.stat == -1 || solKO.x(bioRxnIdx) < 1e-8 + essentiality(i) = {'essential'}; + end + + % Since a gene can be involved in multiple reactions, multiple + % engineering manipulations can be suggested for the same gene + % e.g. (OE and KD). So, reconcilie them, and report only one. + + % Get reaction index, in targets set, for each gene + rxns_for_gene = find(rxnGeneM(:,i) > 0); + actions = unique(target_type(rxns_for_gene)); + if numel(actions) > 1 + % If any OE is suggested give the highest priority over KD or KO + if any(startsWith(actions, 'OE')) + target_type_genes(i) = {'OE'}; + % Otherwise change to KO if not essential + elseif ~any(startsWith(actions, 'OE')) && ~isequal(essentiality(i),{'essential'}) + target_type_genes(i) = {'KO'}; + else + target_type_genes(i) = {'KD'}; + end + else + target_type_genes(i) = actions; + end + % Extract all the slope (from rxns across alphas) conected to % each remaining gene - k_set = k_rxns(rxnGeneM(:,i) > 0); - % Check the kind of control that gene i-th exerts over its reactions - always_down = sum(k_set <= (1-tolerance)) == length(k_set); - always_up = sum(k_set >= (1+tolerance)) == length(k_set); - % Evaluate if gene is always exerting either a positive or negative - % control - cons_genes(i) = always_down + always_up == 1; - k_genes(i) = mean(k_set); + slope_set = slope_rxns(rxns_for_gene); + slope_genes(i) = mean(slope_set); + + progressbar(i/numel(genes)) +end +progressbar(1) % Make sure it closes + +% Order genes from highest to lowest slope: +[~,order] = sort(slope_genes,'descend'); +genes = genes(order,:); +slope_genes = slope_genes(order,:); +target_type_genes = target_type_genes(order,:); +essentiality = essentiality(order,:); + +% Create output: + +% Report values used to enforce objective target production +fseof.alpha = alpha; + +% Exclude enzyme usage reactions +toKeep = ~startsWith(target_rxns,'usage_prot_'); +rxnIdx = getIndexes(model,target_rxns(toKeep),'rxns'); + +% Report the fluxes at each alpha +fseof.v_matrix = array2table(v_matrix(toKeep,:), ... + 'VariableNames', string(alpha), 'RowNames', model.rxns(rxnIdx)); + +% Report reaction targets +fseof.rxnTargets(:,1) = model.rxns(rxnIdx); +fseof.rxnTargets(:,2) = model.rxnNames(rxnIdx); +fseof.rxnTargets(:,3) = num2cell(slope_rxns(toKeep)); +fseof.rxnTargets(:,4) = model.grRules(rxnIdx); +fseof.rxnTargets(:,5) = constructEquations(model,rxnIdx); + +% Identify transport reactions; defined as reactions involving (at least) +% one metabolite name in more than one compartment. +transportRxns = false(numel(rxnIdx),1); +for i = 1:numel(rxnIdx) + % Get the involved metabolites in each reaction + mets = model.metNames(model.S(:,rxnIdx(i))~=0); + % Remove enzyme metabolite + mets(startsWith(mets,'prot_')) = []; + % Validate if it is a transport reaciton + transportRxns(i) = numel(mets) ~= numel(unique(mets)); end -% Keep "consistent genes" -genes = genes(cons_genes); -k_genes = k_genes(cons_genes); -rxnGeneM = rxnGeneM(:,cons_genes); - -if filterG - % Filter any value between mean(alpha) and 1: - unchanged = (k_genes >= mean(alpha) - tolerance) + (k_genes <= 1 + tolerance) == 2; - genes = genes(~unchanged); - k_genes = k_genes(~unchanged); - rxnGeneM = rxnGeneM(:,~unchanged); - % Update results for rxns-related fields (remove remaining reactions - % without any associated gene in rxnGeneM) - toKeep = (sum(rxnGeneM,2) > 0); - rxns = rxns(toKeep,:); - v_matrix = v_matrix(toKeep,:); - k_matrix = k_matrix(toKeep,:); - k_rxns = k_rxns(toKeep,:); +% Create separate table for transport reactions +fseof.transportTargets = fseof.rxnTargets(transportRxns,:); +fseof.rxnTargets(transportRxns,:) = []; + +% Report gene targets +geneIdx = cellfun(@(x) find(strcmpi(model.genes,x)),genes); +fseof.geneTargets(:,1) = genes; +fseof.geneTargets(:,2) = model.geneShortNames(geneIdx); +fseof.geneTargets(:,3) = num2cell(slope_genes); +fseof.geneTargets(:,4) = target_type_genes; +fseof.geneTargets(:,5) = essentiality; + +% Save results in a file if defined +if outputFile + % Write file with gene targets + writetable(cell2table(fseof.geneTargets, ... + 'VariableNames', {'gene_IDs' 'gene_names' 'slope' 'action' 'essentiality'}), ... + fullfile(filePath, 'ecFSEOF_genes.tsv'), ... + 'FileType', 'text', ... + 'Delimiter', '\t', ... + 'QuoteStrings', false); + + % Write file with rxn targets + writetable(cell2table(fseof.rxnTargets, ... + 'VariableNames', {'rxn_IDs' 'rxnNames' 'slope' 'grRules' 'rxn_formula'}), ... + fullfile(filePath, 'ecFSEOF_rxns.tsv'), ... + 'FileType', 'text', ... + 'Delimiter', '\t', ... + 'QuoteStrings', false); + + % Write file with transport targets + writetable(cell2table(fseof.transportTargets, ... + 'VariableNames', {'rxn_IDs' 'rxnNames' 'slope' 'grRules' 'rxn_formula'}), ... + fullfile(filePath, 'ecFSEOF_transporter.tsv'), ... + 'FileType', 'text', ... + 'Delimiter', '\t', ... + 'QuoteStrings', false); + + disp(['ecFSEOF results stored at: ' newline filePath]); end -% Order genes from highest to lowest k: -[~,order] = sort(k_genes,'descend'); -genes = genes(order,:); -k_genes = k_genes(order,:); - -% Create output (exclude enzyme usage reactions): -toKeep = ~startsWith(rxns,'usage_prot_'); -rxnIdx = getIndexes(ecModel,rxns(toKeep),'rxns'); -geneIdx = cellfun(@(x) find(strcmpi(ecModel.genes,x)),genes); -FC.flux_WT = flux_WT; -FC.alpha = alpha; -FC.v_matrix = v_matrix(toKeep,:); -FC.k_matrix = k_matrix(toKeep,:); -FC.rxnsTable(:,1) = ecModel.rxns(rxnIdx); -FC.rxnsTable(:,2) = ecModel.rxnNames(rxnIdx); -FC.rxnsTable(:,3) = num2cell(k_rxns(toKeep)); -FC.rxnsTable(:,4) = ecModel.grRules(rxnIdx); -FC.rxnsTable(:,5) = constructEquations(ecModel,rxnIdx); -FC.geneTable(:,1) = genes; -FC.geneTable(:,2) = ecModel.geneShortNames(geneIdx); -FC.geneTable(:,3) = num2cell(k_genes); - -writetable(cell2table(FC.geneTable, ... - 'VariableNames', {'gene_IDs' 'gene_names' 'K_score'}), ... - fullfile(filePath, 'ecFSEOF_genes.tsv'), ... - 'FileType', 'text', ... - 'Delimiter', '\t', ... - 'QuoteStrings', false); - -writetable(cell2table(FC.rxnsTable, ... - 'VariableNames', {'rxn_IDs' 'rxnNames' 'K_score' 'grRules' 'rxn_formula'}), ... - fullfile(filePath, 'ecFSEOF_rxns.tsv'), ... - 'FileType', 'text', ... - 'Delimiter', '\t', ... - 'QuoteStrings', false); - -disp(['ecFSEOF results stored at: ' newline filePath]); end diff --git a/src/geckomat/utilities/getFluxTarget.m b/src/geckomat/utilities/getFluxTarget.m deleted file mode 100644 index be2059ed0..000000000 --- a/src/geckomat/utilities/getFluxTarget.m +++ /dev/null @@ -1,63 +0,0 @@ -function [maxFlux, minFlux] = getFluxTarget(model,bioRxn,targetRxn,alpha) -% getFluxTarget -% -% Function that performs a series of LP optimizations on an ecModel, -% by first maximizing biomass, then fixing a suboptimal value and -% proceeding to protein pool minimization, last a minimization and -% maximization of a given production target reaction is performed -% subject to the optimal production level. -% -% Input: -% model an ecModel in GECKO 3 format (with ecModel.ec structure). -% bioRxn rxn ID for the biomass reaction. -% targetRxn rxn ID for the production target reaction, a exchange -% reaction is recommended. -% alpha scalling factor for desired suboptimal growth. -% (Optional, default 0.95) -% -% Output: -% minFlux vector of minimum flux rates at minimum target production, -% corresponding to model.rxns -% maxFlux vector of maximum flux rates at maximum target production, -% corresponding to model.rxns -% -% Usage: -% [maxFlux, minFlux] = getFluxTarget(ecModel,bioRxn,targetRxn,alpha) - -if nargin < 4 || isempty(alpha) - alpha = 0.95; -end - -% Fix a suboptimal alpha if equal to 1 -if alpha == 1 - alpha = 0.99; -end - -% Get relevant rxn indexes -bioRxnIdx = getIndexes(model, bioRxn,'rxns'); -poolIdx = strcmpi(model.rxns, 'prot_pool_exchange'); - -% Maximize growth and fix carbon source and suboptimal growth -model = setParam(model, 'obj', bioRxn, 1); -sol = solveLP(model); -model = setParam(model, 'lb', bioRxn, sol.x(bioRxnIdx) * alpha); - -% Minimize prot_pool_exchange and fix -model = setParam(model, 'obj', 'prot_pool_exchange', 1); -sol = solveLP(model); -model = setParam(model, 'lb', 'prot_pool_exchange', sol.x(poolIdx) * 1.01); - -% If minimum fluxes are required get them -minFlux = []; -if nargout == 2 - % Minimize target - model = setParam(model, 'obj', targetRxn, -1); - minSol = solveLP(model); - minFlux = minSol.x; -end - -% Maximize target -model = setParam(model, 'obj', targetRxn, 1); -maxSol = solveLP(model); -maxFlux = maxSol.x; -end diff --git a/src/geckomat/utilities/getSubsetEcModel.m b/src/geckomat/utilities/getSubsetEcModel.m index f94883a0b..f7c8f5cbd 100644 --- a/src/geckomat/utilities/getSubsetEcModel.m +++ b/src/geckomat/utilities/getSubsetEcModel.m @@ -3,6 +3,12 @@ % Generate a context-specific ecModel (strain/cell-line/tissue) by % mapping reactions and genes from a conventional context-specific GEM to % a general ecModel (bigEcModel), to yield a context-specific ecModel. +% +% Both models (bigEcModel and the smallGEM) should have derived from the +% same starting model. If an ecModel with human-GEM 1.15.0 is made +% (yielding the bigEcModel), then human-GEM 1.15.0 should also be the +% basis from which the context-specific GEM (smallGEM) was generated by +% for instance tINIT. % % Input: % bigEcModel enzyme-constrained version of conventional model of @@ -15,6 +21,14 @@ % % Usage: smallEcModel = getSubsetEcModel(smallGEM,bigEcModel) +% Check if models were derived from the same starting GEM +rxnsDiff = find(~ismember(smallGEM.rxns,bigEcModel.rxns)); +if numel(rxnsDiff) > 0 + dispEM(['While both models should have derived from the same starting ',... + 'GEM, the following reactions from smallGEM could not be found ',... + 'in bigEcModel:'],true,smallGEM.rxns(rxnsDiff),false); +end + % Check if original bigEcModel contains context-dependent protein constraints if any(bigEcModel.lb(startsWith(bigEcModel.rxns,'usage_prot_')) ~= -1000) printOrange(['WARNING: The bigEcModel is constraint by protein concentrations that are\n' ... diff --git a/src/geckomat/utilities/startGECKOproject.m b/src/geckomat/utilities/startGECKOproject.m index 4b85b5fc0..f6f2c2c88 100644 --- a/src/geckomat/utilities/startGECKOproject.m +++ b/src/geckomat/utilities/startGECKOproject.m @@ -57,4 +57,5 @@ function startGECKOproject(name, path) else printOrange('WARNING: A project with the same name exits at the same location. The project was not created.\n') end +cd(fullPath) end diff --git a/tutorials/full_ecModel/output/crabtree_preStep17.pdf b/tutorials/full_ecModel/output/crabtree_preStep33.pdf similarity index 100% rename from tutorials/full_ecModel/output/crabtree_preStep17.pdf rename to tutorials/full_ecModel/output/crabtree_preStep33.pdf diff --git a/tutorials/full_ecModel/protocol.m b/tutorials/full_ecModel/protocol.m index 81f338660..c1b17eca8 100644 --- a/tutorials/full_ecModel/protocol.m +++ b/tutorials/full_ecModel/protocol.m @@ -7,16 +7,17 @@ % dependent on how you intend to use the ecModel it may require additional % curation and evaluation. % -% DO NOT USE THE ECMODEL GENERATED HERE OUTSIDE OF THIS TUTORIAL. +% DO NOT DIRECTLY USE THE ECMODEL GENERATED HERE OUTSIDE OF THIS TUTORIAL. % % In comparison to the published GECKO3 Nature Protocols paper, this script % might have more up-to-date descriptions about the capabilities and % functions, which were introduced after the Nature Protocols paper was % published. This script will have additional commands and analyses that % are not included in the paper as such. This script should always work with -% the most recent GECKO3 release. +% the most recent GECKO3 release. The STAGE and STEP numbers match those in the +% Nature Protocols paper. -%% Preparation stage for ecModel reconstruction +%% Installation % - Install RAVEN & GECKO % - Install RAVEN by following the installation instructions: % https://github.com/SysBioChalmers/RAVEN/wiki/Installation @@ -25,29 +26,33 @@ % - The installation of Gurobi as LP solver is highly recommended checkInstallation; % Confirm that RAVEN is functional, should be 2.8.3 or later. -% - GECKO can be installed via cloning or direct download of ZIP file. -% See installation instructions in the README.md: -% https://github.com/SysBioChalmers/GECKO/tree/main#installation -% - Alternatively, GECKO can be installed as MATLAB Add-On: -% https://se.mathworks.com/help/matlab/matlab_env/get-add-ons.html +% - Install GECKO by following the installation instructions: +% https://github.com/SysBioChalmers/GECKO/wiki/Installation-and-upgrade +% - As RAVEN, GECKO can also be installed as MATLAB Add-On (link above) % - Add the appropriate GECKO (sub)folders to MATLAB path: GECKOInstaller.install +%% STAGE 0: Preparation stage for ecModel reconstruction +% STEP 1 Preparation of project file structure and data % - Initiate a basic structure of files and folders for your intended % project. This includes a copy of the template adapter. % The next line is commented out as the project structure is already % available in GECKO/tutorials/full_ecModel. %startGECKOproject() +% STEP 2 Store the starting GEM % - Find a high-quality GEM of your species of interest. ecYeastGEM is % based on yeast-GEM https://github.com/SysBioChalmers/yeast-GEM/releases % - Release v8.6.2 of yeast-GEM is also distributed with GECKO at % tutorials/full_ecModel/models/yeast-GEM.yml -% - Modify the model adapter (e.g. tutorials/full_ecModel/ecYeastGEMadapter.m) -% to contain organism- and model-specific parameters. + +% STEP 3-7 Modify the model adapter +% In the Nature Protocols paper is explained how to decide on the organism- and +% model-specific parameters in the model adapter, which for this tutorial is +% located at tutorials/full_ecModel/ecYeastGEMadapter.m. %% STAGE 1: expansion from a starting metabolic model to an ecModel structure -% STEP 1 Set modelAdapter +% STEP 8 Set modelAdapter adapterLocation = fullfile(findGECKOroot,'tutorials','full_ecModel','YeastGEMAdapter.m'); ModelAdapter = ModelAdapterManager.setDefault(adapterLocation); @@ -66,7 +71,7 @@ % change in the adapter file and run again the command % ModelAdapterManager.setDefault() -% STEP 2 Load conventional yeast-GEM +% STEP 9 Load conventional yeast-GEM % If the location to the conventional GEM was already set in the modelAdapter, % as obj.param.convGEM, then loadConventionalGEM can directly be used. If % the model is stored somewhere else (and not specified in obj.param.convGEM), @@ -75,7 +80,7 @@ model = loadConventionalGEM(); % model = importModel(fullfile(geckoRoot,'tutorials','full_ecModel','models','yeast-GEM.yml')); -% STEP 3 Prepare ecModel +% STEP 10-11 Prepare ecModel % We will make a full GECKO ecModel. For an example of reconstructing a % light GECKO ecModel, see tutorials/light_ecModel. [ecModel, noUniprot] = makeEcModel(model,false); @@ -86,7 +91,7 @@ % function and what is the order of inputs and outputs. doc makeEcModel -% STEP 4 Annotate with complex data +% STEP 12-13 Annotate with complex data % As Saccharomyces cerevisiae is available on Complex Portal % (https://www.ebi.ac.uk/complexportal/), its data is included in % ecYeastGEM. @@ -98,7 +103,7 @@ % complexInfo can be given as second input, but not needed here, as it will % read the file that was written by getComplexData. -% STEP 5 Store model in YAML format +% STEP 14 Store model in YAML format % In this script there is code at the end of each stage to store a copy of % the ecModel. However, only the stage 3 ecModel is kept and distributed % together with GECKO (as is shown at the end of stage 3). @@ -108,11 +113,12 @@ % Uncomment the line below if you want to reload the model. %ecModel=loadEcModel('ecYeastGEM_stage1.yml'); +% STEP 15 Different kcat sources % Decide which kcat source to use. In the steps below, all options are % shown. Not all are required, it is up to the user to decide which ones % they want to use. -% STEP 6 Fuzzy matching with BRENDA +% STEP 16-17 Gather EC numbers, required for BRENDA as kcat source % Requires EC numbers, which are here first taken from the starting model, % with the missing ones taken from Uniprot & KEGG databases. ecModel = getECfromGEM(ecModel); @@ -123,34 +129,38 @@ % reactions. ecModel = getECfromDatabase(ecModel); -% Do the actual fuzzy matching with BRENDA. +% STEP 18-19 Gather kcat values from BRENDA kcatList_fuzzy = fuzzyKcatMatching(ecModel); % Now we have a kcatList, which will be used to update ecModel in a later % step. -% STEP 7 DLKcat prediction through machine learning -% Requires metabolite SMILES, which are gathered from PubChem. +% STEP 20-22 Gather metabolite SMILES, required for DLKcat as kcat source +% Metabolite SMILES are gathered from PubChem. [ecModel, noSmiles] = findMetSmiles(ecModel); +% STEP 23 Prepare DLKcat input file % An input file is written, which is then used by DLKcat, while the output -% file is read back into MATLAB. The full_ecModel tutorial already comes +% file is read back into MATLAB.The full_ecModel tutorial already comes % with a DLKcat.tsv file populated with kcat values. If this file should be % regenerated, the line below should be uncommented. Note that this % overwrites the existing files, thereby discarding existing kcat predictions. %writeDLKcatInput(ecModel,[],[],[],[],true); +% STEP 24 Run DLKcat % runDLKcat will run the DLKcat algorithm via a Docker image. If the % DLKcat.tsv file already has kcat values, these will all be overwritten. %runDLKcat(); + +% STEP 25 Load DLKcat output kcatList_DLKcat = readDLKcatOutput(ecModel); -% STEP 8 Combine kcat from BRENDA and DLKcat +% STEP 26 Combine kcat from BRENDA and DLKcat kcatList_merged = mergeDLKcatAndFuzzyKcats(kcatList_DLKcat, kcatList_fuzzy); -% STEP 9 Take kcatList and populate edModel.ec.kcat +% STEP 27 Take kcatList and populate edModel.ec.kcat ecModel = selectKcatValue(ecModel, kcatList_merged); -% STEP 10 Apply custom kcat values +% STEP 28 Apply custom kcat values % During the development of yeast-GEM ecModels (through GECKO 1 & 2), % various kcat values have been manually curated, to result in a model that % is able to validate a wide range of phenotypes. These curations are @@ -158,12 +168,12 @@ [ecModel, rxnUpdated, notMatch] = applyCustomKcats(ecModel); % To modify the S-matrix, to actually implement the kcat/MW constraints, -% applyKcatConstraints should be run. This is done in STEP 13. +% applyKcatConstraints should be run. This is done in STEP 31. -% STEP 11 Get kcat values across isozymes +% STEP 29 Get kcat values across isozymes ecModel = getKcatAcrossIsozymes(ecModel); -% STEP 12 Get standard kcat +% STEP 30 Get standard kcat % Assign a protein cost to reactions without gene assocation. These % reactions are identified as those without a corresponding entry in % ecModel.grRules. The following reactions are exempted: @@ -184,8 +194,8 @@ % folder, containing the relevant reaction identifiers. [ecModel, rxnsMissingGPR, standardMW, standardKcat] = getStandardKcat(ecModel); -% STEP 13 Apply kcat constraints from ecModel.ec.kcat to ecModel.S -% While STEP 9-12 add or modify values in ecModel.ec.kcat, these contraints +% STEP 31 Apply kcat constraints from ecModel.ec.kcat to ecModel.S +% While STEP 27-30 add or modify values in ecModel.ec.kcat, these contraints % are not yet applied to the S-matrix: the enzyme is not yet set as pseudo- % substrate with the -MW/kcat stoichiometry. Now, applyKcatConstraints will % take the values from ecModel.ec.kcat, considers any complex/subunit data @@ -196,7 +206,7 @@ % reapply the new constraints onto the metabolic model. ecModel = applyKcatConstraints(ecModel); -% STEP 14 Set upper bound of protein pool +% STEP 32 Set upper bound of protein pool % The protein pool exchange is constrained by the total protein content % (Ptot), multiplied by the f-factor (ratio of enzymes/proteins) and the % sigma-factor (how saturated enzymes are on average: how close to their @@ -224,7 +234,7 @@ % Uncomment the line below if you want to reload the model. %ecModel=loadEcModel('ecYeastGEM_stage2.yml'); -% STEP 15 Test maximum growth rate +% STEP 33-38 Test maximum growth rate % Test whether the model is able to reach maximum growth if glucose uptake % is unlimited. First set glucose uptake unconstraint. ecModel = setParam(ecModel,'lb','r_1714',-1000); @@ -237,7 +247,7 @@ % The growth rate is far below 0.41 (the maximum growth rate of % S. cerevisiae, that is entered in the model adapter as obj.params.gR_exp. -% STEP 16 Relax protein pool constraint +% STEP 39-42 Relax protein pool constraint % As a simplistic way to ensure the model to reach the growth rate, the % upper bound of the protein pool exchange reaction can be increased to % whatever is required. This works, but STEP 17 is preferred. @@ -254,7 +264,7 @@ ecModel = setParam(ecModel,'lb','r_4041',0); ecModel = setParam(ecModel,'obj','r_4041',1); -% STEP 17 Sensitivity tuning +% STEP 43-44 Sensitivity tuning % First reset the protein pool constraint to a more realistic value, % reverting STEP 16. ecModel = setProtPoolSize(ecModel); @@ -263,7 +273,7 @@ % Inspect the tunedKcats structure in table format. struct2table(tunedKcats) -% STEP 18 Curate kcat values based on kcat tuning +% STEP 45-51 Curate kcat values based on kcat tuning % As example, the kcat of 5'-phosphoribosylformyl glycinamidine synthetase % (reaction r_0079) was increased from 0.05 to 5. Inspecting the kcat % source might help to determine if this is reasonable. @@ -301,6 +311,7 @@ saveEcModel(ecModel,'ecYeastGEM_stage3.yml'); +% STEP 52 Save the curated ecModel without proteomics integration % This functional ecModel will also be kept in the GECKO GitHub. saveEcModel(ecModel,'ecYeastGEM.yml'); @@ -308,12 +319,12 @@ % Uncomment the line below if you want to reload the model. %ecModel=loadEcModel('ecYeastGEM_stage3.yml'); -% STEP 19 Load proteomics data and constrain ecModel +% STEP 53-57 Load proteomics data and constrain ecModel protData = loadProtData(3); %Number of replicates, only one experiment. ecModel = fillEnzConcs(ecModel,protData); ecModel = constrainEnzConcs(ecModel); -% STEP 20 Update protein pool +% STEP 58 Update protein pool % The protein pool reaction will be constraint by the remaining, unmeasured % enzyme content. This is calculated by subtracting the sum of % ecModel.ec.concs from the condition-specific total protein content. The @@ -322,7 +333,7 @@ fluxData = loadFluxData(); ecModel = updateProtPool(ecModel,fluxData.Ptot(1)); -% STEP 21 Load flux data +% STEP 59-63 Load flux data % Matching the proteomics sample(s), condition-specific flux data needs to % be loaded to constrain the model. This was already loaded in Step 20 for % gathering Ptot, but is repeated here nonetheless. Flux data is read from @@ -336,7 +347,7 @@ % The growth rate of 0.1 is far from being reached. Therefore, the next % step is to flexibilize enzyme concentrations. -% STEP 22 Enzyme concentrations are flexibilized (increased), until the +% STEP 64-65 Enzyme concentrations are flexibilized (increased), until the % intended growth rate is reached. This is condition-specific, so the % intended growth rate is gathered from the fluxData structure. [ecModel, flexEnz] = flexibilizeEnzConcs(ecModel,fluxData.grRate(1),10); @@ -364,32 +375,11 @@ saveEcModel(ecModel,'ecYeastGEM_stage4'); %% STAGE 5: simulation and analysis -% STEP 23 Example of various useful RAVEN functions -% % Set the upper bound of reaction r_0003 to 10. -% ecModel = setParam(ecModel,'ub','r_0003',10); -% % Set the lower bound of reaction r_0003 to 0. -% ecModel = setParam(ecModel,'lb','r_0003',0); -% % Set the objective function to maximize reaction 'r_4041'. -% ecModel = setParam(ecModel,'obj','r_4041',1); -% % Set the objective function to minimize protein usage. -% ecModel = setParam(ecModel,'obj','prot_pool_exchange',1); -% % Perform flux balance analysis (FBA). -% sol = solveLP(ecModel); -% % Perform parsimonious FBA (minimum total flux). -% sol = solveLP(ecModel,1); -% % Inspect exchange fluxes from FBA solution. -% printFluxes(ecModel,sol.x) -% % Inspect all (non-zero) fluxes from FBA solution. -% printFluxes(ecModel,sol.x,false) -% % Export to Excel file (will not contain potential model.ec content). -% exportToExcelFormat(ecModel,'filename.xlsx'); - -% STEP 24 Simulate Crabtree effect with protein pool - -% (Re)load the ecModel without proteomics integration. +% STEP 66 (Re)load the ecModel without proteomics integration. ecModel = loadEcModel('ecYeastGEM.yml'); -% We will soon run a custom plotCrabtree function that is kept in the code +% STEP 67-68 Simulate Crabtree effect with protein pool +% We will below run a custom plotCrabtree function that is kept in the code % subfolder. To run this function we will need to navigate into the folder % where it is stored, but we will navigate back to the current folder % afterwards. @@ -421,16 +411,16 @@ % It is obvious that no total protein constraint is reached, and Crabtree % effect is not observed. -% Perform the Crabtree simulation on the pre-Step 17 ecModel (where kcat +% Perform the Crabtree simulation on the pre-Step 33 ecModel (where kcat % sensitivity tuning has not yet been applied). ecModel_preTuning = loadEcModel('ecYeastGEM_stage2.yml'); ecModel_preTuning = setParam(ecModel_preTuning,'lb','r_1714',-1000); plotCrabtree(ecModel_preTuning); -saveas(gcf,fullfile(params.path,'output','crabtree_preStep17.pdf')) +saveas(gcf,fullfile(params.path,'output','crabtree_preStep33.pdf')) % Without kcat tuning, the model gets constrained too early (at too low % growth rates), which means that no solutions exist at high growth rates. -% STEP 25 Selecting objective functions +% STEP 69-70 Selecting objective functions ecModel = setParam(ecModel,'obj',params.bioRxn,1); sol = solveLP(ecModel) fprintf('Growth rate that is reached: %f /hour.\n', abs(sol.f)) @@ -444,7 +434,7 @@ sol = solveLP(ecModel) fprintf('Minimum protein pool usage: %.2f mg/gDCW.\n', abs(sol.f)) -% STEP 26 Inspect enzyme usage +% STEP 71 Inspect enzyme usage % Show the result from the earlier simulation, without mapping to % non-ecModel. ecModel = setParam(ecModel, 'obj', 'r_1714', 1); @@ -454,7 +444,7 @@ usageReport = reportEnzymeUsage(ecModel,usageData); usageReport.topAbsUsage -% STEP 27 Compare fluxes from ecModel and conventional GEM +% STEP 72 Compare fluxes from ecModel and conventional GEM sol = solveLP(ecModel); % Map the ecModel fluxes back to the conventional GEM. [mappedFlux, enzUsageFlux, usageEnz] = mapRxnsToConv(ecModel, model, sol.x); @@ -462,7 +452,7 @@ numel(mappedFlux) numel(model.rxns) -% STEP 28 Perform (ec)FVA +% STEP 73-75 Perform (ec)FVA % Perform FVA on a conventional GEM, ecModel, and ecModel plus proteomics % integration, all under similar exchange flux constraints. First make sure % that the correct models are loaded. @@ -505,7 +495,7 @@ plotEcFVA(minFlux, maxFlux); saveas(gca, fullfile(params.path,'output','ecFVA.pdf')) -% STEP 29 Compare light and full ecModels +% STEP 76-77 Compare light and full ecModels % For a fair comparison of the two types of ecModels, the custom % plotlightVSfull function makes a light and full ecModel for yeast-GEM and % compares their flux distributions at maximum growth rate. @@ -518,5 +508,5 @@ changedFlux = abs(fluxRatio-1) > 0.001; model.rxnNames(changedFlux) -% STEP 30 Contextualize ecModels +% STEP 78-79 Contextualize ecModels % This step is exemplified in tutorials/light_ecModel/protocol.m. diff --git a/tutorials/light_ecModel/HumanGEMAdapter.m b/tutorials/light_ecModel/HumanGEMAdapter.m index a7e393237..4a9ae97ba 100644 --- a/tutorials/light_ecModel/HumanGEMAdapter.m +++ b/tutorials/light_ecModel/HumanGEMAdapter.m @@ -6,7 +6,7 @@ % The model distributed with the light_ecModel tutorial is Human-GEM % is version 1.15.0, available from - % https://github.com/SysBioChalmers/Human-GEM/releases/tag/v1.3.0 + % https://github.com/SysBioChalmers/Human-GEM/releases/tag/v1.15.0 % In addition, the following lines were run to reduce its size % before storing it in this GECKO tutorial: % ihuman = simplifyModel(ihuman,false,false,true,true); diff --git a/tutorials/light_ecModel/protocol.m b/tutorials/light_ecModel/protocol.m index df8cabdd3..008125e08 100644 --- a/tutorials/light_ecModel/protocol.m +++ b/tutorials/light_ecModel/protocol.m @@ -1,16 +1,17 @@ % This file accompanies the GECKO3 Nature Protocols paper (DOI TO BE ADDED). % % The function of this script is to demonstrate the reconstruction and -% analysis of a *light* ecModel. As example, it here uses the human-GEM -% model of Homo sapiens as starting point. However, this script does not -% claim to construct a "production-ready" ecHumanGEM model: dependent on -% how you intend to use the ecModel it may require additional curation and -% evaluation. +% analysis of a *full* ecModel. As example, it here uses the yeast-GEM +% model of Saccharomyces cerevisiae as starting point. However, this script +% does not claim to construct a "production-ready" ecYeastGEM model: +% dependent on how you intend to use the ecModel it may require additional +% curation and evaluation. % -% DO NOT USE THE ECMODEL GENERATED HERE OUTSIDE OF THIS TUTORIAL. +% DO NOT DIRECTLY USE THE ECMODEL GENERATED HERE OUTSIDE OF THIS TUTORIAL. % % This script has limited comments and explanations of how the various -% functions work, more detailed descriptions can be found in +% functions work and only the STEPs essential for this script are shown. For +% more detailed descriptions and examples of the other STEPs, see % tutorials/full_ecModel/protocol.m. % Prepare software and model adapter @@ -18,15 +19,14 @@ GECKOInstaller.install setRavenSolver('gurobi') -% STEP 1 +% STEP 8 Load the appropriate model adapter adapterLocation = fullfile(findGECKOroot,'tutorials','light_ecModel','HumanGEMAdapter.m'); adapter = ModelAdapterManager.setDefault(adapterLocation); -%% Reconstruct light ecModel -% STEP 2 +% STEP 9 Load the starting human-GEM model model = loadConventionalGEM(); -% STEP 3 +% STEP 10-11 Reconstruct light ecModel % Because Human-GEM contains gene identifiers in ENSEMBL format without % trailing version identifier, and there is no field in Uniprot that % exactly matches that format, a custom uniprotConversion.tsv is generated @@ -45,14 +45,14 @@ % from Human-GEM was not sufficient and additional curation should be done. % This will be skipped for now, as 6 genes is not too many. -% STEP 4 +% STEP 12-13 Apply complex data [ecModel, foundComplex, proposedComplex] = applyComplexData(ecModel); -% STEP 6 +% STEP 16-19 Search for kcat values in BRENDA ecModel = getECfromGEM(ecModel); kcatList_fuzzy = fuzzyKcatMatching(ecModel); -% STEP 7 +% STEP 20-25 Predict kcat values with DLKcat ecModel = findMetSmiles(ecModel); % DLKcat.tsv files are ecModel-type specific and cannot be interchanged % between full and light ecModels. This is due to a difference in how @@ -71,21 +71,21 @@ kcatList_DLKcat = readDLKcatOutput(ecModel); -% STEP 8 +% STEP 26 Merge BRENDA and DLKcat derived values kcatList_merged = mergeDLKcatAndFuzzyKcats(kcatList_DLKcat, kcatList_fuzzy); -% STEP 9 +% STEP 27 Implement kcat avlues into model.ec.kcat ecModel = selectKcatValue(ecModel,kcatList_merged); -% STEP 10 +% STEP 28 % applyCustomKcats can be run on light ecModels in a similar way as for % full ecModels. However, no custom kcat values are defined for the ecModel % in this tutorial. -% STEP 11 +% STEP 29 % getKcatAcrossIsozymes cannot be run on light ecModels. -% STEP 12 +% STEP 30 % getStandardKcat can be run on light ecModels in a similar way as for full % ecModels. But due to the different model structure, no "standard" % pseudo-enzyme is included. @@ -96,7 +96,7 @@ % -standardMW/standardKcat as the stoiciometric coefficient. [ecModel, rxnsMissingGPR, standardMW, standardKcat] = getStandardKcat(ecModel); -% STEP 13 +% STEP 31 Apply kcat avlues to S-matrix % In light ecModels the kcat values are also kept in ecModels.ec.kcat, and % only after running applyKcatConstraints are these introduced in the % ecModel.S matrix. In constrast to full ecModels, the applyKcatConstraints @@ -104,13 +104,13 @@ % (lowest MW/kcat value), and uses this value. ecModel = applyKcatConstraints(ecModel); -% STEP 14 +% STEP 32 Constrain with total protein pool ecModel = setProtPoolSize(ecModel); sol=solveLP(ecModel,1) printFluxes(ecModel,sol.x) saveEcModel(ecModel); -% STEP 15-18 +% STEP 43-44 % sensitivityTuning also works on light ecModels. However, in this tutorial % an ecModel for the generic Human-GEM model is generated. This model is % not directly suitable for simulations, as it does not represent a @@ -119,10 +119,10 @@ % cells, or otherwise the natural environment]), and what maximum growth % rate should be reached. -% STEP 19-22 +% STEP 53-57, 64-65 % Proteomics integration is NOT possible with light ecModels. -% STEP 25 +% STEP 69-70 % Simulating fluxes in light ecModels would follow a similar strategy as % would be suitable for simulating full ecModels without proteomics % integration. E.g. first optimization of a "classical" objective function @@ -141,21 +141,21 @@ sol = solveLP(ecModel) fprintf('Minimum protein pool usage: %g mg/gDCW.\n', abs(sol.f)) -% STEP 26 +% STEP 71 % Individual enzyme usages cannot be investigated in light ecModels, as % these are not explicitly included in the S-matrix. -% STEP 27-28 +% STEP 72 % Mapping fluxes to the conventional GEM works identical as for full % ecModels. Due to enzymes not being explicitly included in the S-matrix, % the enzUsageFlux only includes the protein pool exchange. [mappedFlux, enzUsageFlux, usageEnz] = mapRxnsToConv(ecModel, model, sol.x); -% STEP 29 +% STEP 76-77 % The comparison between full and light ecModel is demonstrated with a % custom function in tutorials/full_ecModel/code/plotlightVSfull.m. -% STEP 30 +% STEP 78 Make context-specific ecModel by subsetting from a general model % To exemplify the construction of a context-specific ecModel, a % conventional GEM of HT-29 cell line is loaded. HT29 = readYAMLmodel(fullfile(adapter.params.path,'models','HT29-GEM.yml')); @@ -164,6 +164,7 @@ ecModel = loadEcModel(); ecHT29 = getSubsetEcModel(ecModel,HT29); +% STEP 79 Compare results from generic and context-specific models % Without detailed inspection of all differences between the three models, % it is clear that both the enzyme constraints and contextualization have % had its impact as the maximum growth rate is affected: