Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start naming KG2 TSV tarball with version number (in S3)? #140

Closed
amykglen opened this issue Aug 27, 2021 · 12 comments
Closed

Start naming KG2 TSV tarball with version number (in S3)? #140

amykglen opened this issue Aug 27, 2021 · 12 comments

Comments

@amykglen
Copy link
Member

amykglen commented Aug 27, 2021

Instead of extracting KG2pre data via Neo4j, going forward the KG2c build process is going to ingest kg2-tsv-for-neo4j.tar.gz (downloaded from the rtx-kg2 S3 bucket).

Wondering if it would be reasonable to start naming that tarball in S3 with the KG2 version number? So, something like:
kg2-7-2-tsv-for-neo4j.tar.gz

I realize that means we'd have to periodically delete old versions of the file so the S3 bucket doesn't get overly full, but it'd be really nice for the KG2c build process to be able to make sure it gets the right tarball (since currently the tarball is overwritten every time a new KG2pre build is done).

@amykglen amykglen added the enhancement New feature or request label Aug 27, 2021
@saramsey
Copy link
Member

saramsey commented Aug 31, 2021

OK, I am thinking about how to do this while still preserving automation in the tsv-to-neo4j.sh script.

@saramsey
Copy link
Member

I have created branch issue-140 for working this issue

@saramsey
Copy link
Member

I have a mini build-system for the issue-140 branch working on my MBP, for development/test purposes for this issue.

@saramsey saramsey self-assigned this Sep 9, 2021
@saramsey
Copy link
Member

Lili and I discussed it and we feel this issue may slip until after 2.7.4

@saramsey
Copy link
Member

Wondering if we can prioritize this for the next few weeks? @acevedol and @ecwood do you think it is doable?

@saramsey
Copy link
Member

saramsey commented Aug 21, 2023

I'm specifically thinking that the output filenames that go to the S3 bucket should have the version number in the filename. I don't think the filenames on buildkg2.rtx.ai or whatever need
to have the version number in the filename. Does that simplify things somewhat?

In hindsight, I don't think my decision to copy files like kg2-simplified.json to the S3 bucket without a version number in the filename, was a very good choice. Too much chance for confusion. It puts us in the position of having to check MD5 hashes or inspect the RTX:KG2 node in order to be sure which version the file is. We end up doing a surprising amount of that, and it seems like it could mostly be avoided if the S3 file artifacts had the version number embedded. Or were stored in a version--named folder on S3 (to avoid clutter in the bucket).

@ecwood ecwood assigned ecwood and unassigned saramsey Aug 21, 2023
@ecwood
Copy link
Collaborator

ecwood commented Aug 21, 2023

I can try to work on this in the next few weeks. I like the idea of a version-named folder on S3 to avoid clutter.

@amykglen
Copy link
Member Author

is there any way this could be implemented soon? really all we would like is that the kg2-tsv-for-neo4j.tar.gz in S3 is somehow named with its version number - either in the filename itself or by putting it in a subdirectory for that version. no need to change the file name within the KG2pre build itself (just upon upload to S3). it would be a big help for improving the robustness of KG2c builds.

ecwood added a commit that referenced this issue Jul 16, 2024
ecwood added a commit that referenced this issue Jul 16, 2024
ecwood added a commit that referenced this issue Jul 17, 2024
@ecwood
Copy link
Collaborator

ecwood commented Jul 17, 2024

This should be done now. It will look something like kg2-tsv-for-neo4j-KG2.X.Y.tar.gz in the next build.

@amykglen
Copy link
Member Author

awesome, thank you!!

ecwood added a commit that referenced this issue Sep 2, 2024
@ecwood
Copy link
Collaborator

ecwood commented Sep 8, 2024

This was successful in the KG2.10.1pre build, other than the report compare (as described here: #408 (comment)), so I am closing out this issue.

@saramsey
Copy link
Member

saramsey commented Sep 9, 2024

Thank you @ecwood for doing this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants