Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update README.md #3

Merged
merged 3 commits into from
Oct 17, 2024
Merged

Update README.md #3

merged 3 commits into from
Oct 17, 2024

Conversation

adf-ncgr
Copy link
Contributor

added some details about CHECKSUM files, basic instructions and rationale

added some details about CHECKSUM files, basic instructions and rationale
@adf-ncgr
Copy link
Contributor Author

@StevenCannon-USDA let me know if these changes conform to your understanding of metadata management. @el239 let me know if anything seems unclear

Copy link
Contributor

@StevenCannon-USDA StevenCannon-USDA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, but could be expanded. I suggest the following (more or less; please season to taste):


Several types of files should be checked for valid format and contents prior to pushing to the datastore-metadata repository:

  validate.sh description_genus_species YAML_FILE
  validate.sh description_genus         YAML_FILE
  validate.sh readme                    YAML_FILE

The metadata files in each collection should also be compressed and indexed, using the following script, which is found in the scripts/ directory of this repository:

  compress_and_index.sh COLLECTION

Also, calculate md5 sums for the files in the collection with the following script - again, found in the scripts/ directory:

  mdsum-folder.bash COLLECTION

For additional description of protocols related to data curation for the data store, see https://github.com/legumeinfo/datastore-specifications/tree/main/PROTOCOLS

@adf-ncgr
Copy link
Contributor Author

Thanks @StevenCannon-USDA I'm happy to make changes but just so I'm clear you are referring to scripts in the datastore-specifications repo, correct? And is the validate.sh script you reference this one (which for some reason isn't under scripts and lacks an extension)?:
https://github.com/legumeinfo/datastore-specifications/blob/main/validate

thanks for the reminder about the detailed PROTOCOLS writeups

@StevenCannon-USDA
Copy link
Contributor

... you are referring to scripts in the datastore-specifications repo, correct?

Right. My mistake.

And is the validate.sh script you reference this one (which for some reason isn't under scripts and lacks an extension)?: https://github.com/legumeinfo/datastore-specifications/blob/main/validate

No; I see that the script that we call is at /usr/local/bin/validate.sh (not under version control, afaik). It is a wrapper to the one that you mention, at the root of the datastore-specifications directory. Probably not the optimal organization ...

additional details requested by @StevenCannon-USDA
add link to github UI for actions
@adf-ncgr
Copy link
Contributor Author

OK, thanks. I added that script to the repo and added some of the further details you suggested. Will go ahead with the merge but feel free to revise if anything seems inaccurate/insufficient (I skipped the bit about compressing/indexing since that's more specific to certain types of collections, I think- and probably already covered somewhere under the linked PROTOCOLS I'm guessing).

@adf-ncgr adf-ncgr merged commit de272f9 into main Oct 17, 2024
2 checks passed
@adf-ncgr adf-ncgr deleted the adf-ncgr-patch-1 branch October 17, 2024 05:41
```
validate.sh description_genus_species YAML_FILE
validate.sh description_genus YAML_FILE
validate.sh readme YAML_FILE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script just kept printing the usage at me, I must be doing something wrong. But, I was able to validate the yaml with executable in the parent directory as per its usage example:
./validate readme.schema.json YAML_FILE

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, were you running this on the ISU system or on one of the NCGR LIS servers? I think there's a hard-coded path in that script that may be throwing things off if not run on the former, but it should be pretty easily fixed. I put the script in the repository without thinking about it too much, but TBH I didn't actually know it existed until Steven mentioned it...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was on one of our own, I guess that must be why.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think I fixed the hardcoding issue. I think that was indeed causing the behavior you described with it just printing usage. In any case, the alternate (rawer) validation method you figured out should be equivalent, the validate.sh script just does some bit of wrapping of that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants