-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update README.md #3
Conversation
added some details about CHECKSUM files, basic instructions and rationale
@StevenCannon-USDA let me know if these changes conform to your understanding of metadata management. @el239 let me know if anything seems unclear |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, but could be expanded. I suggest the following (more or less; please season to taste):
Several types of files should be checked for valid format and contents prior to pushing to the datastore-metadata repository:
validate.sh description_genus_species YAML_FILE
validate.sh description_genus YAML_FILE
validate.sh readme YAML_FILE
The metadata files in each collection should also be compressed and indexed, using the following script, which is found in the scripts/
directory of this repository:
compress_and_index.sh COLLECTION
Also, calculate md5 sums for the files in the collection with the following script - again, found in the scripts/
directory:
mdsum-folder.bash COLLECTION
For additional description of protocols related to data curation for the data store, see https://github.com/legumeinfo/datastore-specifications/tree/main/PROTOCOLS
Thanks @StevenCannon-USDA I'm happy to make changes but just so I'm clear you are referring to scripts in the datastore-specifications repo, correct? And is the validate.sh script you reference this one (which for some reason isn't under scripts and lacks an extension)?: thanks for the reminder about the detailed PROTOCOLS writeups |
Right. My mistake.
No; I see that the script that we call is at |
additional details requested by @StevenCannon-USDA
add link to github UI for actions
OK, thanks. I added that script to the repo and added some of the further details you suggested. Will go ahead with the merge but feel free to revise if anything seems inaccurate/insufficient (I skipped the bit about compressing/indexing since that's more specific to certain types of collections, I think- and probably already covered somewhere under the linked PROTOCOLS I'm guessing). |
``` | ||
validate.sh description_genus_species YAML_FILE | ||
validate.sh description_genus YAML_FILE | ||
validate.sh readme YAML_FILE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This script just kept printing the usage at me, I must be doing something wrong. But, I was able to validate the yaml with executable in the parent directory as per its usage example:
./validate readme.schema.json YAML_FILE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, were you running this on the ISU system or on one of the NCGR LIS servers? I think there's a hard-coded path in that script that may be throwing things off if not run on the former, but it should be pretty easily fixed. I put the script in the repository without thinking about it too much, but TBH I didn't actually know it existed until Steven mentioned it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was on one of our own, I guess that must be why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I think I fixed the hardcoding issue. I think that was indeed causing the behavior you described with it just printing usage. In any case, the alternate (rawer) validation method you figured out should be equivalent, the validate.sh script just does some bit of wrapping of that.
added some details about CHECKSUM files, basic instructions and rationale