-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow using SIMD intrinsics for SSE2, SSE4, AVX2 and NEON code #8209
base: main
Are you sure you want to change the base?
Conversation
@python-pillow/pillow-team What do you think? |
I don't see why testing x86_64 would be an issue - we already have special cases for Pillow/.github/workflows/test.yml Lines 53 to 54 in 2152a17
I think the only x86 (32-bit) job we have is on AppVeyor (using SysWOW64): Lines 21 to 23 in 2152a17
s390x and ppc64le are tested using QEMU: Pillow/.github/workflows/test-docker.yml Lines 39 to 40 in 2152a17
I see you've already done this 👍
I think we should ideally aim for implementing this at some point in the future, but starting with compile-time selection for now would be a good first step. |
bd65e1c
to
7fa8fb8
Compare
Love it 🚀 Thanks @homm, all |
Corrected return types
Current SIMD state
There is a separate Pillow-SIMD package on PyPI with modified Pillow code, which includes some rewritten routines using SSE4 and AVX2 compiler intrinsics. Any version of Pillow-SIMD has strictly the same functionality, behavior, and bugs as the corresponding Pillow version. There are many problems with this approach, including:
Proposal
I'd like to move SIMD-accelerated code to the upstream with some enhancements for cross-platform compilation. How it affects described problems:
Challenges
Testing
The main challenge is testing. If some code is contributed to the upstream, it should be run during tests. There could be the following versions of the code:
Detecting Current Acceleration
There should be a way to detect the current acceleration implementation. I think some property for the core object should be enough.
Criticism
The proposed solution doesn't provide first-class SIMD-acceleration for Pillow. When you are using any application or installing other libraries, like NumPy, there is only one compiled version available for a platform, and it contains all types of available acceleration. The right accelerated functions are dynamically called at runtime instead of recompiling for particular CPU capabilities during installation. However, such an approach requires much more effort, and this is not the aim of the current proposal.