-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[stdlib] Refactor normalize_index
adding parameters to centralize needed functionality
#3783
base: nightly
Are you sure you want to change the base?
Conversation
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Clamping to container length seems like the wrong behavior to me, and a great way to cause silent bugs. I'd much rather see either Optional (meaning you don't get return optimizations in safe mode), raising, or a program abort. |
@owenhilyard This is a tool, not the final implementation. Every collection will have to decide which version of this function to implement. We could change the default to not be to clamp to container length and make it return -1 on overflow (or something along those lines) and the collection type downstream decides whether to return an optional or raise on -1. WDYT? |
@martinvuyk I went and did some benchmarks with a few variations of how to add safety here: No clamp: Clamp: To me, it seems like optional is the least annoying to deal with while not hurting perf that much even if you do want the clamping behavior, and it forces users to handle the case where they went off the end of the array. You can't use immovable values with most collections anyway, unless you have some wrapper type your return (which can be |
Hi @owenhilyard thanks for taking the time to do that. Could you show me the code? There may be some tiny aspects that end up impacting the whole thing, e.g. if you're generating random indexes, how big the amount of indexes you're testing are, how distributed around the true bounds of the collection (realistically the amount of misses will be say < 1 % of cases). I think on the side of branch prediction there might also be some perf improvements for optional on a bigger scale, not sure of how try/except is lowered to asm (can there also be an improvement there ?). I think we might also be able to improve instruction fetching (for several index accesses) using the likely intrinsic on the optional if check 🤔. I could setup some big benchmarks on our benchmark module here when I have more time this next week (or the one after that).
Yeah I also like optionals more for index access APIs but we need to have many APIs which are like Python's, so we need all of the configurable options inside
True, but there are many APIs which Python inherited from C which we must also support. So we need to find a good perf
That would be great
I think that won't hold for 32 bit systems |
I treated it as throwaway code and wrote it in /tmp, so it's gone. This was a happy path test, since you should almost never encounter the unhappy path unless things have gone wrong, so all of the indexes I generated were valid indexes to dereference. Again, these are not overheads most people will notice, and the reason I lean towards Optional is because it forces considering what happens on an invalid index when you write the code. If you REALLY need the placement optimizations, you can use a
Python raises on error, but that is going to be very annoying to deal with, especially before we have checked errors. I lean towards
The python stdlib is called through CPython, I don't think we have to re-invent C's mistakes just because Python did.
An index of SIZE_MAX for a collection for an array of bytes would mean that not only does the code consume no memory, but also the collection itself is zero-sized. SIZE_MAX is the byte count of memory. The only thing that could run into this limit is if you had a smaller than byte type and stored it in a collection larger than 1/8 of memory, which requires a larger than register (on most systems) type to index. If you are doing this, you either are either using wide pointers or doing tricks with virtual memory/address spaces. As long as we assume that the size of the |
Signed-off-by: martinvuyk <[email protected]>
normalize_index
to cap to container length and other detailsnormalize_index
adding parameters to centralize needed functionality
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Signed-off-by: martinvuyk <[email protected]>
Refactor
normalize_index
adding parameters to centralize needed functionalityCloses #3644
Part of #2948
Changed the assert_mode to be "none" by default (I would like it to be "warn" but it doesn't seem to work at comp time) and made it a parameter for customization. Crashing on index out of bounds access is "safe" in the sense that it does not allow the access, but highly unsafe for code where you don't expect your program to crash. I have previously expressed this opinion on the PR where this function was created. Now that I've used it again I ended up just hand rolling my own implementation, which is pointless for the supposed purpose of this function: "standard and safe way to normalize indices"