Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add metadata and miscellaneous adjustments #95

Merged
merged 13 commits into from
Mar 28, 2019
Merged

Conversation

haowang-bioinfo
Copy link
Member

@haowang-bioinfo haowang-bioinfo commented Mar 23, 2019

Main improvements in this PR:

I hereby confirm that I have:

  • Tested my code on my own computer for running the model
  • Selected devel as a target branch

JonathanRob and others added 11 commits March 18, 2019 15:54
Starting with an exact copy of RAVEN's writeYaml function.
- All "names:" entries are now enclosed in double quotes.
- Subsystems have been converted to lists, even when they contain one value.
- All "annotation:" fields are now excluded from the yaml.
Much of the metadata is currently hard-coded, but this can be updated with future model iterations.
- Instead of manually coding the necessary metadata, now arrange these info into the existing fields in RAVEN and extract out accordingly by this refactoring.
- Change filename from issue71 to addMetaData
Additional changes with consensus are added here:
- Reformat EC-number in eccodes field
- Remove rxnComps field
- Turn `version` into a blank field
- Initialize rxnConfidenceScores field  with zero
name = strcat(name,'.yaml');
end

%{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the code is under version control already, keeping code saved as comments might feel redundant.

Copy link
Member Author

@haowang-bioinfo haowang-bioinfo Mar 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to keep the YAML extension as '.yml', and retain better compatibility with RAVEN and Cobrapy.

fprintf(fid,'- metabolites:\n');
[~,pos] = sort(model.mets);
for i = 1:length(model.mets)
fprintf(fid,' - !!omap\n');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid the repetition of fprintf omap, how about creating a function printHeader(fid, numberOfSpaces) that would just call fprintf with the number of spaces required before omap?

% metabolites
fprintf(fid,'- metabolites:\n');
[~,pos] = sort(model.mets);
for i = 1:length(model.mets)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of calling writeFieldModel so many times, how about creating a data structure for (field, fieldType, text) e.g. ('mets', 'txt', '- id') and iterating through that, so there would be only one call to writeField?


end

function writeField(model,fid,fieldName,type,pos,name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no overlap between these different cases of writeField, why not have different functions instead? It would be more efficient than having to check every time what type of field it is?

annotation.taxonomy='9606';
annotation.note='Human genome-scale metabolic models are important tools for the study of human health and diseases, by providing a scaffold upon which different types of data can be analyzed. This is the latest version of human-GEM, which is a genome-scale model of the generic human cell. The objective of human-GEM is to serve as a community model for enabling integrative and mechanistic studies of human metabolism.';
annotation.sourceUrl='https://github.com/SysBioChalmers/human-GEM';
annotation.authorList='Jonathan Robinson, Hao Wang, Pierre-Etienne Cholley';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would have been more interesting to have authors as an array, so that more info could be added, e.g. ORCID.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, a proper array for authors would have been better, and I would say mandatory for us to develop a proper YAML parser function on metabolic Atlas.

% 5. Initialize rxnConfidenceScores field with zero
%

%% Load the model
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's up with this ^M ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed by replacing the Windows line break characters

@JonathanRob
Copy link
Collaborator

@mihai-sysbio Regarding many of your suggested changes to writeHumanYaml, I agree. This function was just copied from RAVEN's writeYaml function, and I didn't bother to re-write/optimize it, since I was aiming for something that worked without needing to spend much time on it. As long as they're not causing any problems at this point, we can plan to update the function over time, unless someone wants to make the changes now.

As for the ^M, this is an end of line character typically originating from a Windows-based text editor, so either @Hao-Chalmers or @pinarkocabas should probably double check their editors so see if they're saving in the correct format.


% Consistengly add a blank space after each semicolon
eccodes = regexprep(eccodes,';','; ');

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be certain that you're not creating any double spaces after semicolons (if they already have a space), then line 48 should be changed to:
eccodes = regexprep(eccodes,';\s*','; ');
which will remove any trailing spaces that may already exist after a semicolon.

Copy link
Member Author

@haowang-bioinfo haowang-bioinfo Mar 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JonathanRob thx for detail reviewing. Before adding line 48, a sanity check has been made and confirmed that no trailing space exists after semicolon in this field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JonathanRob thx for detail reviewing. Before adding line 48, a sanity check has been made and confirmed that no trailing space exists after semicolon in this field.

But there will be next time you run this script, so Jon's point is good and you made the change

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pecholleyc this is a throwaway script, rather than a function for repetitive tasks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in that case...

value = ['"',field{pos},'"'];
else
value = field{pos};
end
Copy link
Member Author

@haowang-bioinfo haowang-bioinfo Mar 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is preferred to leave out double quotes, if the parsing of strings with escape characters can be managed using a more robust Yaml import library.

@haowang-bioinfo haowang-bioinfo merged commit c2c2186 into devel Mar 28, 2019
@haowang-bioinfo haowang-bioinfo deleted the addMetaData branch March 28, 2019 18:45
@haowang-bioinfo haowang-bioinfo mentioned this pull request Apr 2, 2019
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants