-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add metadata and miscellaneous adjustments #95
Conversation
Starting with an exact copy of RAVEN's writeYaml function.
- All "names:" entries are now enclosed in double quotes. - Subsystems have been converted to lists, even when they contain one value. - All "annotation:" fields are now excluded from the yaml.
Much of the metadata is currently hard-coded, but this can be updated with future model iterations.
- Instead of manually coding the necessary metadata, now arrange these info into the existing fields in RAVEN and extract out accordingly by this refactoring.
- Change filename from issue71 to addMetaData
Additional changes with consensus are added here: - Reformat EC-number in eccodes field - Remove rxnComps field - Turn `version` into a blank field - Initialize rxnConfidenceScores field with zero
name = strcat(name,'.yaml'); | ||
end | ||
|
||
%{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the code is under version control already, keeping code saved as comments might feel redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be good to keep the YAML extension as '.yml', and retain better compatibility with RAVEN and Cobrapy.
fprintf(fid,'- metabolites:\n'); | ||
[~,pos] = sort(model.mets); | ||
for i = 1:length(model.mets) | ||
fprintf(fid,' - !!omap\n'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid the repetition of fprintf omap
, how about creating a function printHeader(fid, numberOfSpaces)
that would just call fprintf
with the number of spaces required before omap
?
% metabolites | ||
fprintf(fid,'- metabolites:\n'); | ||
[~,pos] = sort(model.mets); | ||
for i = 1:length(model.mets) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of calling writeFieldModel
so many times, how about creating a data structure for (field, fieldType, text)
e.g. ('mets', 'txt', '- id')
and iterating through that, so there would be only one call to writeField
?
|
||
end | ||
|
||
function writeField(model,fid,fieldName,type,pos,name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no overlap between these different cases of writeField
, why not have different functions instead? It would be more efficient than having to check every time what type of field it is?
annotation.taxonomy='9606'; | ||
annotation.note='Human genome-scale metabolic models are important tools for the study of human health and diseases, by providing a scaffold upon which different types of data can be analyzed. This is the latest version of human-GEM, which is a genome-scale model of the generic human cell. The objective of human-GEM is to serve as a community model for enabling integrative and mechanistic studies of human metabolism.'; | ||
annotation.sourceUrl='https://github.com/SysBioChalmers/human-GEM'; | ||
annotation.authorList='Jonathan Robinson, Hao Wang, Pierre-Etienne Cholley'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would have been more interesting to have authors as an array, so that more info could be added, e.g. ORCID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, a proper array for authors would have been better, and I would say mandatory for us to develop a proper YAML parser function on metabolic Atlas.
% 5. Initialize rxnConfidenceScores field with zero | ||
% | ||
|
||
%% Load the model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's up with this ^M
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed by replacing the Windows line break characters
@mihai-sysbio Regarding many of your suggested changes to As for the |
|
||
% Consistengly add a blank space after each semicolon | ||
eccodes = regexprep(eccodes,';','; '); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be certain that you're not creating any double spaces after semicolons (if they already have a space), then line 48 should be changed to:
eccodes = regexprep(eccodes,';\s*','; ');
which will remove any trailing spaces that may already exist after a semicolon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JonathanRob thx for detail reviewing. Before adding line 48, a sanity check has been made and confirmed that no trailing space exists after semicolon in this field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JonathanRob thx for detail reviewing. Before adding line 48, a sanity check has been made and confirmed that no trailing space exists after semicolon in this field.
But there will be next time you run this script, so Jon's point is good and you made the change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pecholleyc this is a throwaway script, rather than a function for repetitive tasks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in that case...
value = ['"',field{pos},'"']; | ||
else | ||
value = field{pos}; | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is preferred to leave out double quotes, if the parsing of strings with escape characters can be managed using a more robust Yaml import library.
Main improvements in this PR:
writeHumanYaml
that is adapted from thewriteYaml
function in RAVEN and customized forhumanGEM
model curationmiscModelCurationScript_20190323
for following tasks:annotation
field, as discussed in Metadata in plaintext exports #71eccodes
field, as discussed in fix: standardize EC numbers in human-GEM #93rxnComps
field, according to #184version
into a blank field for having a simple and clear work flowrxnConfidenceScores
field with zero, as discussed in Incorrect values in the rxnConfidenceScores field #48I hereby confirm that I have:
devel
as a target branch