-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raise exception when open with write mode in call stack #140
base: master
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #140 +/- ##
========================================
- Coverage 93.5% 93.5% -0.1%
========================================
Files 21 21
Lines 1119 1154 +35
========================================
+ Hits 1047 1079 +32
- Misses 72 75 +3
|
tests/test_cloudpath_file_io.py
Outdated
# p, | ||
# "w", | ||
# ) as f: | ||
# pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case fails because the test frame's code_context
looks like this:
[' with open(\n']
Not sure what to do about it.
Ok @jayqi, this really sent me down a 🐰 🕳️ ... Here's a notebook showing a few more options: Thoughts on those? |
Wow, well done. 👏 It looks to me like you're capturing the main cases. I think grabbing the entire source of the frame and parsing the AST is probably the only reliable way to do this. One concern I have is how much overhead this adds to each call, and how it scales with the frame's size. |
Yeah, that's exactly my concern too. I'll see if I can add some benchmarking. No clue how long those things take.... |
OK, I updated that gist above with some benchmarking. Here are my takeaways:
So, I think if we go with Option 1 as outlined in #128 we should: For me, options 2-4 as outlined in #128 are also still on the table given this research. |
Do you think 6 ms wall time per run isn't untenably slow but I guess not trivial. Adds a minute to 10,000 files. |
I think since the frame we want to look at is from the caller's code, it will depend on how their code is installed so not sure that we can speed it up. I did just poke at bytecode disassembly with Agreed that 6ms isn't large enough to rule this approach out. I think to improve the user friendliness of working with paths, I'm inclined towards this change. |
@jayqi Check out the latest commits. This seems to work, but it does add a ton of complex code for a narrow-ish use case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Pull request authors can't request changes on their own pull request." 😂 🙄
|
||
|
||
# entire function is passed as source, so check separately | ||
# that all of the built in open write modes fail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is interesting. I thought it was a feature of my previous version of the test that the function would have both safe and unsafe open
s, and only fail for the unsafe one. Is it the case that if we had a safe open
first and unsafeopen
second in the same function, it would fail on the first one because the checking code would see the second open
?
Anyways, I think some additional things we want to test:
open(pathlib_path, "w")
andopen(cloud_path
, "w")` in the same function- monkeypatching that constant to
False
disables the check
Uh oh, did we end up using a Python 3.8-only feature? It looks like the 3.8 build passed but 3.6 and 3.7 failed. |
Yeah, see above:
I'm seeing differences in what line |
OK, I think this is ready for review. It is passing all the tests. I do think that this is a pretty substantial amount of complexity to add for maitenance, so we may want to turn this off by default if we get lots of bug reports that come up in this code block. I wouldn't be surprised if there are cases where the Performance updateHere's info on timing for this implementation: Script: from cloudpathlib import CloudPath
import statistics
import time
if __name__ == "__main__":
p = CloudPath('s3://cloudpathlib-test-bucket/5bmg9iTs74wmewasLTP44W-test_cloudpath_file_io-test_file_discovery/dir_0/file0_0.txt')
it = 100
res = []
for _ in range(it):
start = time.time()
with open(p, "r"):
pass
res.append(time.time() - start)
print(f"Median time {it} iterations: {statistics.median(res)}") Results: Around +0.05s on average for each call to
|
A pass at trying to detect when
__fspath__
is called byopen
with a write mode. IMO it doesn't work all that well. It appearsopen
is a weird function that doesn't have its own frame in the call stack. It jumps from__fspath__
's frame directly to the frame that callsopen
. I don't think there's a straightforward way to get object references from the frames (it's bytecode or source), so also hard to reliably detect ifopen
is being called on the cloud path instance, or to pick up on what modeopen
is being called with.