-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Credentials JSONFileCache explodes if cache file has malformed JSON #3106
Comments
Thinking about it last night, I also believe the current code will never heal a corrupted cache file on its own. With the current strategy, a cache entry is only updated if it doesn't exist yet (cache miss) or the data can be parsed and evaluated for expiration. A cache file that can't be read or parsed will break the library forever unless there's manual intervention to clear the cache on disk, and that intervention is less likely because the exceptions raised by the cache class don't contain enough information to suggest a solution. |
Thanks for reaching out and for your patience here. I could reproduce the errors you referenced using your code snippet. I will plan to bring this up with the team for discussion. |
After discussing with the team, we think that there may be some opportunities to improve the behavior here. I created a PR with the error message you suggested for the team to consider: #3183. Further investigation may still be needed in terms of when the error occurs in Botocore/the AWS CLI. |
It looks like #3213 was closed in favor of this one, but this one looks to previously be about a confusing error message (and proposed fix to the message). Is it safe to assume this one is now also about fixing the race condition (potentially with a flock) to avoid corrupting the cache file at all? Let me know if it’d be more helpful to submit an untested PR with a proposed fix. Thank you! |
Thanks for following up, this issue is more focused on the error message so I can reopen #3213 for further review/discussion. You can submit a PR for this but I can't provide any guarantees on if or when it might be considered. I think more research is needed here into what can cause |
I think the error message is a good triage, and it's definitely worth exploring corruption root causes. But long-term, I also think it's worth making cache failures non-fatal. Maybe a warning, if there's value in communicating that this optimization has failed. |
think its been done in it for the mean time values for it. |
Describe the bug
The
JSONFileCache
class__getitem__
accessor tries to load JSON from a file, but does not handle syntax errors in that file, raising an exception that bubbles up to the client application.Given this is a cache, a failure to read from the cache should be treated as a cache miss, not an error.
Expected Behavior
In
botocore/credentials.py
, in theCachedCredentialFetcher
class, I expected_load_from_cache()
to returnNone
if the cache was invalid, causing credentials to be re-fetched and the cache to be rebuilt.Current Behavior
I found this while using
awscli
, which consumesbotocore
. When a credentials cache file was malformed, every command using that particular AWS account failed super cryptically, printing the bad file's cache key to STDERR:Using a debugger, I located the failure at
utils.py:3518
, whenbotocore
failed to parse a cache file.Cache file contents (prettified here):
(The bit at the bottom,
: true, "RetryAttempts": 2}}
invalidates the JSON.)Reproduction Steps
Here's a minimal reproduction:
Example run:
Possible Solution
Workaround
Delete the affected cache file, or delete all files in the cache dir (e.g.
~/.aws/cli/cache/
).Fix
This is tricky.
JSONFileCache
doesn't know what it's storing, andCachedCredentialFetcher
doesn't know what kind of cache it's using.CachedCredentialFetcher
assumes that if a key exists in the cache, it's valid to read, which is incorrect forJSONFileCache
.Suggestion:
JSONFileCache#__getitem__
, catch(OSError, ValueError, json.decoder.JSONDecodeError)
and re-raise any of them as a newCacheMissError
.CachedCredentialFetcher#_load_from_cache
, catchCacheMissError
and returnNone
. This will hopefully trigger a credential refresh that will overwrite the invalid cache file.Triage
If the maintainers prefer to raise exceptions here (maybe there's a lot of extra work to change behavior), in the meantime, the exception message in
JSONFileCache#__getitem__
could be improved to suggest manual cleanup:Additional Information/Context
It took me most of a day to track this down when it started causing
awscli
errors. I initially thought it was an AWS permissions issue, or a problem with my local credentials, or corruptedawscli
configuration, or my environment, or something a reboot might fix. None of my teammates could reproduce it.I was disappointed to find out it was a cache error!
Thanks for your consideration.
SDK version used
v1.34.28 (also observed in v2.0.0dev155, installed as awscli v2.15.13 dependency via homebrew)
Environment details (OS name and version, etc.)
MacOS 14.2.1 (arm64)
The text was updated successfully, but these errors were encountered: