feat: add metadata and miscellaneous adjustments #95

haowang-bioinfo · 2019-03-23T22:08:54Z

Main improvements in this PR:

This PR is to incorporate metadata into model files for development of the Metabolic Atlas website, as detailed in Metadata in plaintext exports #71:
- Add function writeHumanYaml that is adapted from the writeYaml function in RAVEN and customized for humanGEM model curation
- Add script miscModelCurationScript_20190323 for following tasks:
  - Incoporate metadata to the annotation field, as discussed in Metadata in plaintext exports #71
  - Reformat EC-number in eccodes field, as discussed in fix: standardize EC numbers in human-GEM #93
  - Remove rxnComps field, according to #184
  - Turn version into a blank field for having a simple and clear work flow
  - Initialize rxnConfidenceScores field with zero, as discussed in Incorrect values in the rxnConfidenceScores field #48

I hereby confirm that I have:

Tested my code on my own computer for running the model
Selected devel as a target branch

Starting with an exact copy of RAVEN's writeYaml function.

- All "names:" entries are now enclosed in double quotes. - Subsystems have been converted to lists, even when they contain one value. - All "annotation:" fields are now excluded from the yaml.

Much of the metadata is currently hard-coded, but this can be updated with future model iterations.

- Instead of manually coding the necessary metadata, now arrange these info into the existing fields in RAVEN and extract out accordingly by this refactoring.

- Change filename from issue71 to addMetaData

Additional changes with consensus are added here: - Reformat EC-number in eccodes field - Remove rxnComps field - Turn `version` into a blank field - Initialize rxnConfidenceScores field with zero

mihai-sysbio · 2019-03-24T10:17:57Z

ComplementaryScripts/Functions/writeHumanYaml.m

+    name = strcat(name,'.yaml');
+end
+
+%{


Since the code is under version control already, keeping code saved as comments might feel redundant.

It might be good to keep the YAML extension as '.yml', and retain better compatibility with RAVEN and Cobrapy.

mihai-sysbio · 2019-03-24T10:22:03Z

ComplementaryScripts/Functions/writeHumanYaml.m

+fprintf(fid,'- metabolites:\n');
+[~,pos] = sort(model.mets);
+for i = 1:length(model.mets)
+    fprintf(fid,'  - !!omap\n');


To avoid the repetition of fprintf omap, how about creating a function printHeader(fid, numberOfSpaces) that would just call fprintf with the number of spaces required before omap?

mihai-sysbio · 2019-03-24T10:34:10Z

ComplementaryScripts/Functions/writeHumanYaml.m

+% metabolites
+fprintf(fid,'- metabolites:\n');
+[~,pos] = sort(model.mets);
+for i = 1:length(model.mets)


Instead of calling writeFieldModel so many times, how about creating a data structure for (field, fieldType, text) e.g. ('mets', 'txt', '- id') and iterating through that, so there would be only one call to writeField?

mihai-sysbio · 2019-03-24T10:36:07Z

ComplementaryScripts/Functions/writeHumanYaml.m

+
+end
+
+function writeField(model,fid,fieldName,type,pos,name)


There is no overlap between these different cases of writeField, why not have different functions instead? It would be more efficient than having to check every time what type of field it is?

mihai-sysbio · 2019-03-24T10:37:58Z

ComplementaryScripts/modelCuration/miscModelCurationScript_20190323.m

+annotation.taxonomy='9606';
+annotation.note='Human genome-scale metabolic models are important tools for the study of human health and diseases, by providing a scaffold upon which different types of data can be analyzed. This is the latest version of human-GEM, which is a genome-scale model of the generic human cell. The objective of human-GEM is to serve as a community model for enabling integrative and mechanistic studies of human metabolism.';
+annotation.sourceUrl='https://github.com/SysBioChalmers/human-GEM';
+annotation.authorList='Jonathan Robinson, Hao Wang, Pierre-Etienne Cholley';


It would have been more interesting to have authors as an array, so that more info could be added, e.g. ORCID.

Indeed, a proper array for authors would have been better, and I would say mandatory for us to develop a proper YAML parser function on metabolic Atlas.

mihai-sysbio · 2019-03-24T10:39:26Z

ComplementaryScripts/modelCuration/miscModelCurationScript_20190323.m

+%           5. Initialize rxnConfidenceScores field with zero
+%
+
+%% Load the model


What's up with this ^M ?

fixed by replacing the Windows line break characters

JonathanRob · 2019-03-25T08:31:54Z

@mihai-sysbio Regarding many of your suggested changes to writeHumanYaml, I agree. This function was just copied from RAVEN's writeYaml function, and I didn't bother to re-write/optimize it, since I was aiming for something that worked without needing to spend much time on it. As long as they're not causing any problems at this point, we can plan to update the function over time, unless someone wants to make the changes now.

As for the ^M, this is an end of line character typically originating from a Windows-based text editor, so either @Hao-Chalmers or @pinarkocabas should probably double check their editors so see if they're saving in the correct format.

JonathanRob · 2019-03-25T08:39:40Z

ComplementaryScripts/modelCuration/miscModelCurationScript_20190323.m

+
+% Consistengly add a blank space after each semicolon
+eccodes = regexprep(eccodes,';','; ');
+


To be certain that you're not creating any double spaces after semicolons (if they already have a space), then line 48 should be changed to:
eccodes = regexprep(eccodes,';\s*','; ');
which will remove any trailing spaces that may already exist after a semicolon.

@JonathanRob thx for detail reviewing. Before adding line 48, a sanity check has been made and confirmed that no trailing space exists after semicolon in this field.

@JonathanRob thx for detail reviewing. Before adding line 48, a sanity check has been made and confirmed that no trailing space exists after semicolon in this field.

But there will be next time you run this script, so Jon's point is good and you made the change

@pecholleyc this is a throwaway script, rather than a function for repetitive tasks.

in that case...

haowang-bioinfo · 2019-03-25T09:51:35Z

ComplementaryScripts/Functions/writeHumanYaml.m

+                value = ['"',field{pos},'"'];
+            else
+                value = field{pos};
+            end


It is preferred to leave out double quotes, if the parsing of strings with escape characters can be managed using a more robust Yaml import library.

JonathanRob and others added 11 commits March 18, 2019 15:54

feat: writeHumanYaml new function to write model yaml file

3e6239d

Starting with an exact copy of RAVEN's writeYaml function.

feat: writeHumanYaml update for integration with Metabolic Atals

97b3c89

- All "names:" entries are now enclosed in double quotes. - Subsystems have been converted to lists, even when they contain one value. - All "annotation:" fields are now excluded from the yaml.

style: writeHumanYaml minor stylistic changes to comments

735384f

style: writeHumanYaml for reactions, remove the "s" from "metabolites:"

54e13e0

feat: writeHumanYaml add metadata section

36c5f1b

Much of the metadata is currently hard-coded, but this can be updated with future model iterations.

refactor: extract metadata from defined fields in RAVEN

f1dbdc5

- Instead of manually coding the necessary metadata, now arrange these info into the existing fields in RAVEN and extract out accordingly by this refactoring.

feat: add metadata to human GEM

92e6230

style: rename script

6d4ceb2

- Change filename from issue71 to addMetaData

feat: incorporate additional changes to addMetaData script

ecdb210

Additional changes with consensus are added here: - Reformat EC-number in eccodes field - Remove rxnComps field - Turn `version` into a blank field - Initialize rxnConfidenceScores field with zero

doc/style: refine documentation and rename script

4a46a59

fix: modify .description field according to #94

ebafd4f

haowang-bioinfo requested review from JonathanRob, pecholleyc and pinarkocabas March 23, 2019 22:09

mihai-sysbio reviewed Mar 24, 2019

View reviewed changes

JonathanRob reviewed Mar 25, 2019

View reviewed changes

haowang-bioinfo commented Mar 25, 2019

View reviewed changes

pecholleyc added 2 commits March 25, 2019 13:52

fix: add Pınar Kocabaş as author

f5c3f03

fix: line ending

d0d253d

JonathanRob approved these changes Mar 28, 2019

View reviewed changes

pecholleyc approved these changes Mar 28, 2019

View reviewed changes

haowang-bioinfo merged commit c2c2186 into devel Mar 28, 2019

haowang-bioinfo deleted the addMetaData branch March 28, 2019 18:45

haowang-bioinfo mentioned this pull request Apr 2, 2019

human v1.0.1 #97

Merged

2 tasks

haowang-bioinfo mentioned this pull request Apr 6, 2020

Metadata in plaintext exports #71

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add metadata and miscellaneous adjustments #95

feat: add metadata and miscellaneous adjustments #95

haowang-bioinfo commented Mar 23, 2019 •

edited

Loading

mihai-sysbio Mar 24, 2019

haowang-bioinfo Mar 25, 2019 •

edited

Loading

mihai-sysbio Mar 24, 2019

mihai-sysbio Mar 24, 2019

mihai-sysbio Mar 24, 2019

mihai-sysbio Mar 24, 2019

pecholleyc Mar 28, 2019

mihai-sysbio Mar 24, 2019

pecholleyc Mar 28, 2019

JonathanRob commented Mar 25, 2019

JonathanRob Mar 25, 2019

haowang-bioinfo Mar 25, 2019 •

edited

Loading

pecholleyc Mar 25, 2019

haowang-bioinfo Mar 25, 2019

pecholleyc Mar 28, 2019

haowang-bioinfo Mar 25, 2019 •

edited

Loading


		% Consistengly add a blank space after each semicolon
		eccodes = regexprep(eccodes,';','; ');

feat: add metadata and miscellaneous adjustments #95

feat: add metadata and miscellaneous adjustments #95

Conversation

haowang-bioinfo commented Mar 23, 2019 • edited Loading

Main improvements in this PR:

Choose a reason for hiding this comment

haowang-bioinfo Mar 25, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JonathanRob commented Mar 25, 2019

Choose a reason for hiding this comment

haowang-bioinfo Mar 25, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

haowang-bioinfo Mar 25, 2019 • edited Loading

Choose a reason for hiding this comment

haowang-bioinfo commented Mar 23, 2019 •

edited

Loading

haowang-bioinfo Mar 25, 2019 •

edited

Loading

haowang-bioinfo Mar 25, 2019 •

edited

Loading

haowang-bioinfo Mar 25, 2019 •

edited

Loading