-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert TOC to use cache map #3555
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a bunch of comments, most of which are small. However, I'm concerned that you've removed the test case for a symlink which references an existing directory outside the dataset result. Also, it's not obvious whether a CacheType.OTHER
should have a "size"
key in the "details", and the code is not consistent on that point (which means we're probably missing a test case). And, I think some of the CacheMapEntry
/CacheMap
type hints are now wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK; I found a bunch of duplicated lines from a weird rebase that I need to fix...
If you still have the original commits available (like if the hash IDs happen to be in your terminal scrollback), you should consider starting over from your un-re-based branch. You might have better luck if you squash your branch (particularly the commits which come from the other PR) before rebasing. |
Yeah, in cases like this I would have been better off squashing both before the rebase, I suspect. But aside from git deciding to duplicate some lines without showing them as conflicts, it wasn't terrible and I definitely wouldn't want to start over! Just a few more segments to clean up: and scanning through the diffs on GitHub pointed out a few comments I'd overlooked after the rebase scramble. I should be able to clean this up "quickly"(ish) tomorrow morning... |
And, for future reference, when you rebase a branch of a branch onto main when the base-branch has been merged, you also have the option of specifying the branching-off place (so, you rebase just the last commit or three, instead of trying to rebase the whole branch hoping that Git will drop most of the commits because they are already in the target). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realize that this PR is currently a draft, but here are some things I found.
PBENCH-1249 On large datasets, our direct tarball extraction method can time out the API call. Unlike on a long intake, there is no persistent artifact so a retry will always time out as well. This applies to any `get_inventory` call, and therefore to the `/inventory`, `/visualize`, and `/compare` APIs; and given the central importance of those APIs for our Server 1.0 story, that's not an acceptable failure mode. This PR mitigates that problem with a "compromise" partial cache manager, leveraging the existing `unpack` method but adding a file lock to manage shared access. The idea is that any consumer of tarball contents (including the indexer) will unpack the entire tarball, but leave a "last reference" timestamp. A periodic timer service will check the cache unpack timestamps, and delete the unpack directories which aren't currently locked and which haven't been referenced for longer than a set time period. __NOTE__: I'm posting a draft mostly for coverage data after a lot of drift in the cache manager unit tests, to determine whether more work is necessary. The "last reference" and reclaim mechanism isn't yet implemented, though that should be the "easy part" now that I've got the server code working.
I've verified that the timer service removes sufficiently old cache data and that the data is unpacked again on request. The reclaim operation is audited. I should probably audit the unpack a well, but haven't done that here. I'm still hoping for a successful CI run to check cobertura coverage.
We probably won't want to audit cache load longer term, but right now it probably makes sense to keep track.
Allow holding lock from unpack to stream, and conversion between `EX` and `SH` lock modes.
I can't figure out why the default `ubi9` container configuration + EPEL is no longer finding `rsyslog-mmjsonparse`. I've found no relevant hits on searches nor any obvious workaround. For now, try changing `Pipeline.gy` to override the default `BASE_IMAGE` and use `centos:stream9` instead.
1. Fix Inventory.close() to always close the stream. 2. Make cache load more transparent by upgrading lock if we need to unpack.
PBENCH-1192 This has been on my wishlist for a while, but was blocked by not actually having a usable cache. PR distributed-system-analysis#3550 introduces a functioning (if minimal) cache manager, and this PR layers on top of that. Note that, despite distributed-system-analysis#3550 introducing a live cache, this PR represents the first actual use of the in-memory cache map, and some adjustments were necessary to make it work outside of the unit test environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks generally good. There are a couple of small things to be attended to (there are some lingering repeated lines and similar, et al.).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Just a few nits to consider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to go (unless you want to do something for the two remaining open conversations).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dang; I had responses pending, and I thought I'd submitted them. Sigh.
It looks like maybe you merged this before the tests were done, and GitHub permanently captured that information...interesting! 🤔 |
Yikes -- I hadn't actually meant to do that, and I think I just assumed it was done. 😦 But I don't think it's "captured" -- even though you have to hit another button to open the test details pane, it's still there and the summary will presumably update when the tests are done. No, OK; even though the "View Details" shows the tests are running, Jenkins shows they're complete ... so I guess it is captured, and that's even weirder... |
PBENCH-1192
This has been on my wishlist for a while, but was blocked by not actually having a usable cache. PR #3550 introduces a functioning (if minimal) cache manager, and this PR layers on top of that. The immediate motivation stems from an email exchange regarding Crucible, and the fact that Andrew would like (not surprisingly) to be able to access the contents of an archived tarball. Having TOC code relying on the Pbench-specific
run-toc
Elasticsearch index is not sustainable.Note that, despite #3550 introducing a live cache, this PR represents the first actual use of the in-memory cache map, and some adjustments were necessary to make it work outside of the unit test environment.
Only the final commits here represent the TOC changes: most are from the prior PR.