Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert TOC to use cache map #3555

Merged
merged 21 commits into from
Oct 16, 2023
Merged

Conversation

dbutenhof
Copy link
Member

@dbutenhof dbutenhof commented Oct 5, 2023

PBENCH-1192

This has been on my wishlist for a while, but was blocked by not actually having a usable cache. PR #3550 introduces a functioning (if minimal) cache manager, and this PR layers on top of that. The immediate motivation stems from an email exchange regarding Crucible, and the fact that Andrew would like (not surprisingly) to be able to access the contents of an archived tarball. Having TOC code relying on the Pbench-specific run-toc Elasticsearch index is not sustainable.

Note that, despite #3550 introducing a live cache, this PR represents the first actual use of the in-memory cache map, and some adjustments were necessary to make it work outside of the unit test environment.

Only the final commits here represent the TOC changes: most are from the prior PR.

@dbutenhof dbutenhof added Server Code Infrastructure API Of and relating to application programming interfaces to services and functions labels Oct 5, 2023
@dbutenhof dbutenhof self-assigned this Oct 5, 2023
@dbutenhof dbutenhof marked this pull request as ready for review October 6, 2023 20:49
@dbutenhof dbutenhof requested a review from webbnh October 6, 2023 20:49
Copy link
Member

@webbnh webbnh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a bunch of comments, most of which are small. However, I'm concerned that you've removed the test case for a symlink which references an existing directory outside the dataset result. Also, it's not obvious whether a CacheType.OTHER should have a "size" key in the "details", and the code is not consistent on that point (which means we're probably missing a test case). And, I think some of the CacheMapEntry/CacheMap type hints are now wrong.

lib/pbench/server/api/resources/datasets_contents.py Outdated Show resolved Hide resolved
lib/pbench/server/api/resources/datasets_contents.py Outdated Show resolved Hide resolved
lib/pbench/server/api/resources/datasets_contents.py Outdated Show resolved Hide resolved
lib/pbench/server/cache_manager.py Outdated Show resolved Hide resolved
lib/pbench/test/unit/server/test_cache_manager.py Outdated Show resolved Hide resolved
lib/pbench/test/unit/server/test_datasets_contents.py Outdated Show resolved Hide resolved
lib/pbench/test/unit/server/test_datasets_contents.py Outdated Show resolved Hide resolved
lib/pbench/test/unit/server/test_datasets_contents.py Outdated Show resolved Hide resolved
Copy link
Member Author

@dbutenhof dbutenhof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK; I found a bunch of duplicated lines from a weird rebase that I need to fix...

lib/pbench/cli/server/tree_manage.py Outdated Show resolved Hide resolved
@dbutenhof dbutenhof marked this pull request as draft October 11, 2023 21:39
@webbnh
Copy link
Member

webbnh commented Oct 11, 2023

If you still have the original commits available (like if the hash IDs happen to be in your terminal scrollback), you should consider starting over from your un-re-based branch.

You might have better luck if you squash your branch (particularly the commits which come from the other PR) before rebasing.

@dbutenhof
Copy link
Member Author

If you still have the original commits available (like if the hash IDs happen to be in your terminal scrollback), you should consider starting over from your un-re-based branch.

You might have better luck if you squash your branch (particularly the commits which come from the other PR) before rebasing.

Yeah, in cases like this I would have been better off squashing both before the rebase, I suspect. But aside from git deciding to duplicate some lines without showing them as conflicts, it wasn't terrible and I definitely wouldn't want to start over! Just a few more segments to clean up: and scanning through the diffs on GitHub pointed out a few comments I'd overlooked after the rebase scramble. I should be able to clean this up "quickly"(ish) tomorrow morning...

@webbnh
Copy link
Member

webbnh commented Oct 11, 2023

And, for future reference, when you rebase a branch of a branch onto main when the base-branch has been merged, you also have the option of specifying the branching-off place (so, you rebase just the last commit or three, instead of trying to rebase the whole branch hoping that Git will drop most of the commits because they are already in the target).

Copy link
Member

@webbnh webbnh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize that this PR is currently a draft, but here are some things I found.

lib/pbench/server/cache_manager.py Outdated Show resolved Hide resolved
lib/pbench/server/cache_manager.py Outdated Show resolved Hide resolved
lib/pbench/test/unit/server/test_cache_manager.py Outdated Show resolved Hide resolved
lib/pbench/test/unit/server/test_cache_manager.py Outdated Show resolved Hide resolved
lib/pbench/test/unit/server/test_cache_manager.py Outdated Show resolved Hide resolved
lib/pbench/server/indexing_tarballs.py Outdated Show resolved Hide resolved
lib/pbench/server/api/resources/datasets_contents.py Outdated Show resolved Hide resolved
lib/pbench/server/api/resources/datasets_contents.py Outdated Show resolved Hide resolved
@dbutenhof dbutenhof requested a review from webbnh October 12, 2023 20:57
@dbutenhof dbutenhof marked this pull request as ready for review October 12, 2023 20:57
PBENCH-1249

On large datasets, our direct tarball extraction method can time out the API
call. Unlike on a long intake, there is no persistent artifact so a retry will
always time out as well. This applies to any `get_inventory` call, and
therefore to the `/inventory`, `/visualize`, and `/compare` APIs; and given
the central importance of those APIs for our Server 1.0 story, that's not an
acceptable failure mode.

This PR mitigates that problem with a "compromise" partial cache manager,
leveraging the existing `unpack` method but adding a file lock to manage
shared access. The idea is that any consumer of tarball contents (including
the indexer) will unpack the entire tarball, but leave a "last reference"
timestamp. A periodic timer service will check the cache unpack timestamps,
and delete the unpack directories which aren't currently locked and which
haven't been referenced for longer than a set time period.

__NOTE__: I'm posting a draft mostly for coverage data after a lot of drift
in the cache manager unit tests, to determine whether more work is necessary.
The "last reference" and reclaim mechanism isn't yet implemented, though that
should be the "easy part" now that I've got the server code working.
I've verified that the timer service removes sufficiently old cache data and
that the data is unpacked again on request. The reclaim operation is audited.
I should probably audit the unpack a well, but haven't done that here. I'm
still hoping for a successful CI run to check cobertura coverage.
We probably won't want to audit cache load longer term, but right now it
probably makes sense to keep track.
Allow holding lock from unpack to stream, and conversion between `EX` and `SH`
lock modes.
I can't figure out why the default `ubi9` container configuration + EPEL is no
longer finding `rsyslog-mmjsonparse`. I've found no relevant hits on searches
nor any obvious workaround.

For now, try changing `Pipeline.gy` to override the default `BASE_IMAGE` and
use `centos:stream9` instead.
1. Fix Inventory.close() to always close the stream.
2. Make cache load more transparent by upgrading lock if we need to unpack.
PBENCH-1192

This has been on my wishlist for a while, but was blocked by not actually
having a usable cache. PR distributed-system-analysis#3550 introduces a functioning (if minimal) cache
manager, and this PR layers on top of that.

Note that, despite distributed-system-analysis#3550 introducing a live cache, this PR represents the
first actual use of the in-memory cache map, and some adjustments were
necessary to make it work outside of the unit test environment.
Copy link
Member

@webbnh webbnh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks generally good. There are a couple of small things to be attended to (there are some lingering repeated lines and similar, et al.).

lib/pbench/test/unit/server/test_cache_manager.py Outdated Show resolved Hide resolved
lib/pbench/test/unit/server/test_cache_manager.py Outdated Show resolved Hide resolved
lib/pbench/server/api/resources/datasets_contents.py Outdated Show resolved Hide resolved
lib/pbench/server/cache_manager.py Outdated Show resolved Hide resolved
lib/pbench/test/unit/server/test_cache_manager.py Outdated Show resolved Hide resolved
lib/pbench/test/unit/server/test_cache_manager.py Outdated Show resolved Hide resolved
lib/pbench/test/unit/server/test_datasets_contents.py Outdated Show resolved Hide resolved
lib/pbench/test/unit/server/test_datasets_contents.py Outdated Show resolved Hide resolved
lib/pbench/test/unit/server/test_datasets_contents.py Outdated Show resolved Hide resolved
Copy link
Member

@webbnh webbnh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just a few nits to consider.

lib/pbench/server/cache_manager.py Outdated Show resolved Hide resolved
lib/pbench/test/unit/server/test_cache_manager.py Outdated Show resolved Hide resolved
lib/pbench/test/unit/server/test_cache_manager.py Outdated Show resolved Hide resolved
lib/pbench/test/unit/server/test_datasets_contents.py Outdated Show resolved Hide resolved
lib/pbench/test/unit/server/test_cache_manager.py Outdated Show resolved Hide resolved
lib/pbench/test/unit/server/test_datasets_contents.py Outdated Show resolved Hide resolved
webbnh

This comment was marked as resolved.

Copy link
Member

@webbnh webbnh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to go (unless you want to do something for the two remaining open conversations).

Copy link
Member Author

@dbutenhof dbutenhof left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dang; I had responses pending, and I thought I'd submitted them. Sigh.

lib/pbench/test/unit/server/conftest.py Outdated Show resolved Hide resolved
lib/pbench/test/unit/server/test_cache_manager.py Outdated Show resolved Hide resolved
lib/pbench/test/unit/server/test_cache_manager.py Outdated Show resolved Hide resolved
lib/pbench/test/unit/server/test_cache_manager.py Outdated Show resolved Hide resolved
@dbutenhof dbutenhof merged commit f9c43b5 into distributed-system-analysis:main Oct 16, 2023
3 checks passed
@dbutenhof dbutenhof deleted the toc branch October 16, 2023 19:02
@webbnh
Copy link
Member

webbnh commented Oct 16, 2023

3 of 4 checks passed

It looks like maybe you merged this before the tests were done, and GitHub permanently captured that information...interesting! 🤔

@dbutenhof
Copy link
Member Author

dbutenhof commented Oct 16, 2023

3 of 4 checks passed

It looks like maybe you merged this before the tests were done, and GitHub permanently captured that information...interesting! 🤔

Yikes -- I hadn't actually meant to do that, and I think I just assumed it was done. 😦

But I don't think it's "captured" -- even though you have to hit another button to open the test details pane, it's still there and the summary will presumably update when the tests are done.

No, OK; even though the "View Details" shows the tests are running, Jenkins shows they're complete ... so I guess it is captured, and that's even weirder...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Of and relating to application programming interfaces to services and functions Code Infrastructure Server
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants