Allow using SIMD intrinsics for SSE2, SSE4, AVX2 and NEON code #8209

homm · 2024-07-06T11:31:44Z

Current SIMD state

There is a separate Pillow-SIMD package on PyPI with modified Pillow code, which includes some rewritten routines using SSE4 and AVX2 compiler intrinsics. Any version of Pillow-SIMD has strictly the same functionality, behavior, and bugs as the corresponding Pillow version. There are many problems with this approach, including:

Pillow-SIMD versions are often outdated. This is because any Pillow-SIMD release requires some work to merge SIMD-accelerated code into every Pillow version, which involves resolving potential conflicts and testing all implementations.
SIMD-accelerated code is unconditionally platform-specific. Pillow-SIMD can be compiled and run only on x86. This means that it is very hard to add Pillow-SIMD as a dependency in cross-platform applications.
Pillow-SIMD is a different package, although it shares the same namespace (PIL), which means only one package (Pillow or Pillow-SIMD) should be installed at a time in one environment. It can be tricky to avoid Pillow installation if it is a dependency for some third-party libraries.
Pillow-SIMD itself can't be a dependency for third-party libraries, since it's not cross-platform.
Pillow-SIMD doesn't provide precompiled binaries since there are two types of SIMD-acceleration implemented: SSE4 and AVX2. Pillow-SIMD compiled with AVX2 is a bit faster, but can't be run on non-AVX2 CPUs (including Rosetta 2 on Apple silicon).

Proposal

I'd like to move SIMD-accelerated code to the upstream with some enhancements for cross-platform compilation. How it affects described problems:

Every released Pillow version will contain SIMD-accelerated code, so no additional actions will be required.
Pillow can be compiled on any platform, with or without SIMD-acceleration, if available. No separate package will be required.
Like the previous point, no separate package will be required.
Like the previous point, no separate package will be required.
Default precompiled binaries for Pillow will be provided with minimal available SIMD-acceleration for every platform: SSE2 for x86_64 and NEON for ARM aarch64 (please note, currently there is no SSE2 or NEON accelerated code). For advanced users who want acceleration, compilation documentation will be provided.

Challenges

Testing

The main challenge is testing. If some code is contributed to the upstream, it should be run during tests. There could be the following versions of the code:

Native code: Will be tested on platforms without acceleration support: 32-bit x86, s390x, and ppc64le. Are there any tests for any of these platforms?
SSE2: The most common, since it is the default for x86_64.
SSE4: Not currently tested. Should be added explicitly.
AVX2: Not currently tested. Should be added explicitly.
NEON: Mandatory extension for ARM aarch64, so it will be tested in many ARM configurations.

Detecting Current Acceleration

There should be a way to detect the current acceleration implementation. I think some property for the core object should be enough.

Criticism

The proposed solution doesn't provide first-class SIMD-acceleration for Pillow. When you are using any application or installing other libraries, like NumPy, there is only one compiled version available for a platform, and it contains all types of available acceleration. The right accelerated functions are dynamically called at runtime instead of recompiling for particular CPU capabilities during installation. However, such an approach requires much more effort, and this is not the aim of the current proposal.

homm · 2024-07-08T07:22:04Z

@python-pillow/pillow-team What do you think?

nulano · 2024-07-15T21:16:56Z

I don't see why testing x86_64 would be an issue - we already have special cases for PYTHONOPTIMIZE and could do the same for SSE4 / AVX2:

Pillow/.github/workflows/test.yml

Lines 53 to 54 in 2152a17

    
           - { python-version: "3.11", PYTHONOPTIMIZE: 1, REVERSE: "--reverse" } 
        
           - { python-version: "3.10", PYTHONOPTIMIZE: 2 }

I think the only x86 (32-bit) job we have is on AppVeyor (using SysWOW64):

Pillow/.appveyor.yml

Lines 21 to 23 in 2152a17

    
           - PYTHON: C:/Python312 
        
             ARCHITECTURE: x86 
        
             APPVEYOR_BUILD_WORKER_IMAGE: Visual Studio 2022

However, it seems to have selected sse2: https://ci.appveyor.com/project/Python-pillow/pillow/builds/50165788/job/sv60thgf3ygibbv4?fullLog=true#L3667

s390x and ppc64le are tested using QEMU:

Pillow/.github/workflows/test-docker.yml

Lines 39 to 40 in 2152a17

    
           ubuntu-24.04-noble-ppc64le, 
        
           ubuntu-24.04-noble-s390x,

Detecting Current Acceleration

There should be a way to detect the current acceleration implementation. I think some property for the core object should be enough.

~~I anticipate that we will get questions about this, so I would propose we add this to PIL.features and the report summary PIL.features.pilinfo.~~

I see you've already done this 👍

The right accelerated functions are dynamically called at runtime instead of recompiling for particular CPU capabilities during installation.

I think we should ideally aim for implementing this at some point in the future, but starting with compile-time selection for now would be a good first step.

src/PIL/features.py

.github/workflows/test.yml

aclark4life · 2024-08-11T18:42:54Z

Love it 🚀 Thanks @homm, all

Corrected return types

homm marked this pull request as draft July 6, 2024 11:34

homm force-pushed the simd-init branch from 0b4b3b0 to 5376120 Compare July 6, 2024 11:42

radarhere mentioned this pull request Jul 8, 2024

Updated test uploadcare/pillow-simd#137

Merged

nulano reviewed Jul 15, 2024

View reviewed changes

src/PIL/features.py Outdated Show resolved Hide resolved

homm force-pushed the simd-init branch 3 times, most recently from bd65e1c to 7fa8fb8 Compare July 16, 2024 19:11

hugovk reviewed Jul 17, 2024

View reviewed changes

.github/workflows/test.yml Outdated Show resolved Hide resolved

.github/workflows/test.yml Outdated Show resolved Hide resolved

homm force-pushed the simd-init branch from 7fa8fb8 to b979bb8 Compare July 17, 2024 06:35

homm requested a review from hugovk July 17, 2024 11:36

homm marked this pull request as ready for review July 28, 2024 15:32

homm and others added 8 commits August 11, 2024 20:55

Add SIMD headers

e064963

Add SIMD example

73f76e5

Add __SSE4_2__ for MSCC

9109110

Add core.acceleration attribute

2db9cd3

clang-format

5df34a2

Define better name __NEON__

f891043

Updated test

4d10b3c

Add accelerated test builds

a410dcb

homm force-pushed the simd-init branch from b979bb8 to a410dcb Compare August 11, 2024 16:55

homm added the Performance label Aug 11, 2024

homm mentioned this pull request Aug 11, 2024

SIMD: AlphaComposite SSE4 & AVX2 #8299

Open

update clang formatting

cb55ee6

This was referenced Aug 11, 2024

SIMD: BoxBlur SSE4 #8300

Open

SIMD: Filter SSE4 & AVX2 #8301

Open

radarhere mentioned this pull request Aug 12, 2024

Corrected return types uploadcare/pillow-simd#140

Merged

Do not return string from check_feature()

24c0569

radarhere and others added 2 commits August 12, 2024 18:52

Do not return bool from version_feature()

55a97f9

Merge pull request #140 from radarhere/simd-init

b55da80

Corrected return types

radarhere added the Needs Rebase label Nov 29, 2024

mgorny mentioned this pull request Dec 27, 2024

Package pillow-simd? conda-forge/pillow-feedstock#103

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow using SIMD intrinsics for SSE2, SSE4, AVX2 and NEON code #8209

Allow using SIMD intrinsics for SSE2, SSE4, AVX2 and NEON code #8209

homm commented Jul 6, 2024 •

edited

Loading

homm commented Jul 8, 2024

nulano commented Jul 15, 2024

aclark4life commented Aug 11, 2024

Allow using SIMD intrinsics for SSE2, SSE4, AVX2 and NEON code #8209

Are you sure you want to change the base?

Allow using SIMD intrinsics for SSE2, SSE4, AVX2 and NEON code #8209

Conversation

homm commented Jul 6, 2024 • edited Loading

Current SIMD state

Proposal

Challenges

Testing

Detecting Current Acceleration

Criticism

homm commented Jul 8, 2024

nulano commented Jul 15, 2024

aclark4life commented Aug 11, 2024

homm commented Jul 6, 2024 •

edited

Loading