Convert TOC to use cache map #3555

dbutenhof · 2023-10-05T20:48:54Z

PBENCH-1192

This has been on my wishlist for a while, but was blocked by not actually having a usable cache. PR #3550 introduces a functioning (if minimal) cache manager, and this PR layers on top of that. The immediate motivation stems from an email exchange regarding Crucible, and the fact that Andrew would like (not surprisingly) to be able to access the contents of an archived tarball. Having TOC code relying on the Pbench-specific run-toc Elasticsearch index is not sustainable.

Note that, despite #3550 introducing a live cache, this PR represents the first actual use of the in-memory cache map, and some adjustments were necessary to make it work outside of the unit test environment.

Only the final commits here represent the TOC changes: most are from the prior PR.

webbnh

I have a bunch of comments, most of which are small. However, I'm concerned that you've removed the test case for a symlink which references an existing directory outside the dataset result. Also, it's not obvious whether a CacheType.OTHER should have a "size" key in the "details", and the code is not consistent on that point (which means we're probably missing a test case). And, I think some of the CacheMapEntry/CacheMap type hints are now wrong.

lib/pbench/server/api/resources/datasets_contents.py

lib/pbench/server/cache_manager.py

lib/pbench/test/unit/server/test_cache_manager.py

lib/pbench/test/unit/server/test_datasets_contents.py

dbutenhof

OK; I found a bunch of duplicated lines from a weird rebase that I need to fix...

lib/pbench/cli/server/tree_manage.py

webbnh · 2023-10-11T21:42:24Z

If you still have the original commits available (like if the hash IDs happen to be in your terminal scrollback), you should consider starting over from your un-re-based branch.

You might have better luck if you squash your branch (particularly the commits which come from the other PR) before rebasing.

dbutenhof · 2023-10-11T21:49:15Z

If you still have the original commits available (like if the hash IDs happen to be in your terminal scrollback), you should consider starting over from your un-re-based branch.

You might have better luck if you squash your branch (particularly the commits which come from the other PR) before rebasing.

Yeah, in cases like this I would have been better off squashing both before the rebase, I suspect. But aside from git deciding to duplicate some lines without showing them as conflicts, it wasn't terrible and I definitely wouldn't want to start over! Just a few more segments to clean up: and scanning through the diffs on GitHub pointed out a few comments I'd overlooked after the rebase scramble. I should be able to clean this up "quickly"(ish) tomorrow morning...

webbnh · 2023-10-11T21:52:38Z

And, for future reference, when you rebase a branch of a branch onto main when the base-branch has been merged, you also have the option of specifying the branching-off place (so, you rebase just the last commit or three, instead of trying to rebase the whole branch hoping that Git will drop most of the commits because they are already in the target).

webbnh

I realize that this PR is currently a draft, but here are some things I found.

lib/pbench/server/cache_manager.py

lib/pbench/test/unit/server/test_cache_manager.py

lib/pbench/server/indexing_tarballs.py

lib/pbench/server/api/resources/datasets_contents.py

PBENCH-1249 On large datasets, our direct tarball extraction method can time out the API call. Unlike on a long intake, there is no persistent artifact so a retry will always time out as well. This applies to any `get_inventory` call, and therefore to the `/inventory`, `/visualize`, and `/compare` APIs; and given the central importance of those APIs for our Server 1.0 story, that's not an acceptable failure mode. This PR mitigates that problem with a "compromise" partial cache manager, leveraging the existing `unpack` method but adding a file lock to manage shared access. The idea is that any consumer of tarball contents (including the indexer) will unpack the entire tarball, but leave a "last reference" timestamp. A periodic timer service will check the cache unpack timestamps, and delete the unpack directories which aren't currently locked and which haven't been referenced for longer than a set time period. __NOTE__: I'm posting a draft mostly for coverage data after a lot of drift in the cache manager unit tests, to determine whether more work is necessary. The "last reference" and reclaim mechanism isn't yet implemented, though that should be the "easy part" now that I've got the server code working.

I've verified that the timer service removes sufficiently old cache data and that the data is unpacked again on request. The reclaim operation is audited. I should probably audit the unpack a well, but haven't done that here. I'm still hoping for a successful CI run to check cobertura coverage.

We probably won't want to audit cache load longer term, but right now it probably makes sense to keep track.

Allow holding lock from unpack to stream, and conversion between `EX` and `SH` lock modes.

I can't figure out why the default `ubi9` container configuration + EPEL is no longer finding `rsyslog-mmjsonparse`. I've found no relevant hits on searches nor any obvious workaround. For now, try changing `Pipeline.gy` to override the default `BASE_IMAGE` and use `centos:stream9` instead.

1. Fix Inventory.close() to always close the stream. 2. Make cache load more transparent by upgrading lock if we need to unpack.

PBENCH-1192 This has been on my wishlist for a while, but was blocked by not actually having a usable cache. PR distributed-system-analysis#3550 introduces a functioning (if minimal) cache manager, and this PR layers on top of that. Note that, despite distributed-system-analysis#3550 introducing a live cache, this PR represents the first actual use of the in-memory cache map, and some adjustments were necessary to make it work outside of the unit test environment.

(Plus messy merge with cache manager.)

webbnh

Looks generally good. There are a couple of small things to be attended to (there are some lingering repeated lines and similar, et al.).

lib/pbench/test/unit/server/test_cache_manager.py

lib/pbench/server/api/resources/datasets_contents.py

lib/pbench/server/cache_manager.py

lib/pbench/test/unit/server/test_cache_manager.py

lib/pbench/test/unit/server/test_datasets_contents.py

webbnh

Looks good. Just a few nits to consider.

lib/pbench/server/cache_manager.py

lib/pbench/test/unit/server/test_cache_manager.py

lib/pbench/test/unit/server/test_datasets_contents.py

lib/pbench/test/unit/server/test_cache_manager.py

lib/pbench/test/unit/server/test_datasets_contents.py

webbnh

Good to go (unless you want to do something for the two remaining open conversations).

dbutenhof

Dang; I had responses pending, and I thought I'd submitted them. Sigh.

lib/pbench/test/unit/server/conftest.py

lib/pbench/test/unit/server/test_cache_manager.py

webbnh · 2023-10-16T19:34:26Z

3 of 4 checks passed

It looks like maybe you merged this before the tests were done, and GitHub permanently captured that information...interesting! 🤔

dbutenhof · 2023-10-16T19:43:20Z

3 of 4 checks passed

It looks like maybe you merged this before the tests were done, and GitHub permanently captured that information...interesting! 🤔

Yikes -- I hadn't actually meant to do that, and I think I just assumed it was done. 😦

But I don't think it's "captured" -- even though you have to hit another button to open the test details pane, it's still there and the summary will presumably update when the tests are done.

No, OK; even though the "View Details" shows the tests are running, Jenkins shows they're complete ... so I guess it is captured, and that's even weirder...

dbutenhof added Server Code Infrastructure API Of and relating to application programming interfaces to services and functions labels Oct 5, 2023

dbutenhof self-assigned this Oct 5, 2023

dbutenhof marked this pull request as ready for review October 6, 2023 20:49

dbutenhof requested a review from webbnh October 6, 2023 20:49

webbnh reviewed Oct 10, 2023

View reviewed changes

dbutenhof force-pushed the toc branch from 4a5f798 to 2318350 Compare October 11, 2023 21:20

dbutenhof commented Oct 11, 2023

View reviewed changes

lib/pbench/cli/server/tree_manage.py Outdated Show resolved Hide resolved

dbutenhof marked this pull request as draft October 11, 2023 21:39

webbnh reviewed Oct 11, 2023

View reviewed changes

dbutenhof force-pushed the toc branch from 2318350 to 0decdaa Compare October 12, 2023 11:57

dbutenhof requested a review from webbnh October 12, 2023 20:57

dbutenhof marked this pull request as ready for review October 12, 2023 20:57

dbutenhof added 13 commits October 13, 2023 15:25

Implement cache reclaim and timer service.

e5969e6

Audit unpack

1e86cc3

We probably won't want to audit cache load longer term, but right now it probably makes sense to keep track.

Encapsulate and package locking

fb3c5c0

Allow holding lock from unpack to stream, and conversion between `EX` and `SH` lock modes.

Add some test coverage

0ef6874

Separate basic LockRef from context manager

ee85aac

Revert BASE_IMAGE hack...

e06a7c6

Some refactoring

981b6b2

1. Fix Inventory.close() to always close the stream. 2. Make cache load more transparent by upgrading lock if we need to unpack.

Some refactoring, test coverage & cleanup

c413019

More cleanup and testing

7eb02d2

dbutenhof added 6 commits October 13, 2023 15:25

Add SYMLINK "contents" support & testing

c0dd74a

Dang, I scratched the black paint ...

93d399d

small mock cleanup

0a63808

Clean up symlink handling

a3d6713

(Plus messy merge with cache manager.)

More cleanup again

540bd2d

Tie down loose backups, and code cleanup

cb312b5

webbnh reviewed Oct 13, 2023

View reviewed changes

Review comments

447c74b

dbutenhof force-pushed the toc branch from a496d03 to 447c74b Compare October 16, 2023 14:35

webbnh approved these changes Oct 16, 2023

View reviewed changes

This comment was marked as resolved.

Sign in to view

Fix broken rebase, and a few minor stuffs

5f485ff

webbnh approved these changes Oct 16, 2023

View reviewed changes

dbutenhof commented Oct 16, 2023

View reviewed changes

dbutenhof merged commit f9c43b5 into distributed-system-analysis:main Oct 16, 2023
3 checks passed

dbutenhof deleted the toc branch October 16, 2023 19:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert TOC to use cache map #3555

Convert TOC to use cache map #3555

dbutenhof commented Oct 5, 2023 •

edited

Loading

webbnh left a comment

dbutenhof left a comment

webbnh commented Oct 11, 2023

dbutenhof commented Oct 11, 2023

webbnh commented Oct 11, 2023

webbnh left a comment

webbnh left a comment

webbnh left a comment

This comment was marked as resolved.

webbnh left a comment

dbutenhof left a comment

webbnh commented Oct 16, 2023

dbutenhof commented Oct 16, 2023 •

edited

Loading

Convert TOC to use cache map #3555

Convert TOC to use cache map #3555

Conversation

dbutenhof commented Oct 5, 2023 • edited Loading

webbnh left a comment

Choose a reason for hiding this comment

dbutenhof left a comment

Choose a reason for hiding this comment

webbnh commented Oct 11, 2023

dbutenhof commented Oct 11, 2023

webbnh commented Oct 11, 2023

webbnh left a comment

Choose a reason for hiding this comment

webbnh left a comment

Choose a reason for hiding this comment

webbnh left a comment

Choose a reason for hiding this comment

This comment was marked as resolved.

webbnh left a comment

Choose a reason for hiding this comment

dbutenhof left a comment

Choose a reason for hiding this comment

webbnh commented Oct 16, 2023

dbutenhof commented Oct 16, 2023 • edited Loading

dbutenhof commented Oct 5, 2023 •

edited

Loading

dbutenhof commented Oct 16, 2023 •

edited

Loading