-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race condition in botocore's use of JSONFileCache for credential loading #3213
Comments
The following does reproduce the issue instantly: from threading import Thread
from botocore.utils import JSONFileCache
cache = JSONFileCache()
def f():
for i in range(100000):
cache["key"] = 10
cache["key"]
threads = []
for i in range(2):
thread = Thread(target=f, name=f"Thread-{i}")
threads.append(thread)
thread.start()
for thread in threads:
thread.join() Adding a lock in each of |
Thanks for reaching out. The error you referenced was also reported here: #3106. I was advised that an error message like this could help clarify the behavior here: https://github.com/boto/botocore/pull/3183/files. Does deleting the cache file fix this for you? |
Yeah wrapping this in a retry in our app code does work (and is what I did in parallel to filing this bug). Apologies for not realizing it was previously reported. And thank you Laurent for the multi-threaded repro! (Our problematic setup is multi-process, which is why I didn't propose in-process locks) |
Thanks for following up and confirming, I'll go ahead and close this as a duplicate and we continue tracking the issue in #3106. |
This issue is now closed. Comments on closed issues are hard for our team to see. |
I'm hitting this same issue - In my case I'm spawning 10 From what I can tell this only happens when the cache file doesn't already exist, so I'm working around it by doing a dummy aws cli call like |
markdown is supported security policy and code of conduct is also different as security policy for the proper use |
This happens to me frequently (more than once a week) while using skypilot, which makes heave use of the botocore library for the AWS API, often in parallel in multiple processes. It is not a temporary problem - the cache json will become malformed and all future attempts will crash until I manually fix or delete the malformed json file. The nature of the broken file is like this:
There is always one or two extra characters at the end of the file. I believe this is likely due to the race condition mentioned in OP:
|
Describe the bug
I am getting the a rare crash when using botocore to load files.
It's a
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
stemming fromJSONFileCache.__getitem__
, which looks to imply the json file is empty (didn't find any value at char 0). Someone helped point out it might be a race condition in theJSONFileCache.__set__
, which appears to do:We have multiple concurrent processes starting on a box that each are using botocore, so maybe this is just a race condition if one of them happens to look at the file post-truncate-pre-write? Not sure if a flock, or write-then-rename, or something else ends up a proper solution here?
Expected Behavior
It shouldn't crash
Current Behavior
Reproduction Steps
Hard to repro, and haven't tried myself. I assume thousands of processes recreating the botocore cached credentials file would do it.
Possible Solution
Perhaps a flock or a write-to-temp-file-then-rename-to-destination-file-address, instead of truncate-then-write?
Additional Information/Context
No response
SDK version used
1.34.42
Environment details (OS name and version, etc.)
AWS instance running x86_64 GNU/Linux
The text was updated successfully, but these errors were encountered: