`blur` function is too slow #986

Aloxaf · 2019-07-08T09:22:12Z

blur is too slow, imageproc::filter::gaussian_blur_f32 is faster than it, but still slow.

Reproduction steps

use image::imageops::blur;
use image::{Rgb, RgbImage};
use imageproc::drawing::draw_filled_circle_mut;
use imageproc::filter::gaussian_blur_f32;
use std::time::Instant;

fn main() {
    let mut image = RgbImage::new(1000, 1000);
    draw_filled_circle_mut(&mut image, (500, 500), 500, Rgb([255, 255, 255]));

    let start = Instant::now();
    blur(&image, 100.0).save("rust_1.png").unwrap();
    dbg!(start.elapsed());

    let start = Instant::now();
    gaussian_blur_f32(&image, 100.0).save("rust_2.png").unwrap();
    dbg!(start.elapsed());
}

result is

[src/main.rs:13] start.elapsed() = 12.362780732s
[src/main.rs:17] start.elapsed() = 3.335411994s

and Pillow

from PIL import Image, ImageDraw, ImageFilter
from time import time

img  = Image.new('RGB', (1000, 1000))
draw = ImageDraw.Draw(img)
draw.ellipse((0, 0, 1000, 1000), fill='#ffffff')

start = time()
img.filter(ImageFilter.GaussianBlur(100)).save('python.png')
print(f'{time() - start}s')

result is 0.1433885097503662s

Here is the final image

rust_1.png (it's a little strange...

rust_2.png

python.png

The text was updated successfully, but these errors were encountered:

HeroicKatora · 2019-07-08T10:22:27Z

Just to clarify, PIL uses a C version indirectly https://github.com/python-pillow/Pillow/blob/ab9a25d623fdd7f8de3e724b538f5660eac589ae/src/libImaging/BoxBlur.c#L294

It's still somewhat embarrasingly slow.

Aloxaf · 2019-07-08T12:02:14Z

@HeroicKatora Yes, maybe it would be a good idea to port it rust?

theotherphil · 2019-07-08T19:36:19Z

Pillow appears to use (a variant on) iterated box filtering, as defined in http://www.mia.uni-saarland.de/Publications/gwosdek-ssvm11.pdf.

There's an open imageproc ticket to implement this: image-rs/imageproc#93

theotherphil · 2019-07-08T19:39:26Z

I'll have a look at implementing this over the coming weekend.

Shnatsel · 2019-10-02T21:02:44Z

There is actually a linear-complexity blur already implemented in Rust, where execution time is independent of blur radius: https://github.com/fschutt/fastblur

A copy of it was copied into resvg codebase and polished to match Chrome blur. You can find it in this file, search for box_blur. It should be fairly easy to extract it back into a common crate and/or copy it into image.

vadixidav · 2020-04-02T23:53:36Z

You may want to take a look at the code here. It was originally based on a different paper, written by the author of AKAZE, ported to rust by someone else, and then modified later by me (so it has been touched by many hands). It makes use of the following papers:

S. Grewenig, J. Weickert, C. Schroers, A. Bruhn. Cyclic Schemes for PDE-Based Image Analysis. Technical Report No. 327, Department of Mathematics, Saarland University, Saarbrücken, Germany, March 2013
S. Grewenig, J. Weickert, A. Bruhn. From box filtering to fast explicit diffusion. DAGM, 2010

It is actually not used for guassian blur in akaze itself. I am also not 100% sure it can help speed up guassian blur, but I think it can based on what I have seen in its use in akaze. I am no expert. The way this algorithm works is that it determines a number of diffusion steps that start small and grow bigger as the stability increases (which may not apply to guassian blur?). See this file for how the specialized blur is being applied (with ndarray). I was able to get pretty good speed by writing my filters with ndarray like that. My situation is a bit more specific to this library, but the code is free for you to take.

RReverser · 2021-03-16T16:14:47Z

Also noticed this while comparing Wasm version of image::imageops::blur and JS implementation of Gaussian blur from https://github.com/nodeca/glur.

At high radius values the difference becomes ridiculous - e.g. for 3872x2592 image and radius 300px the image crate's version executes in 94 seconds, while JS executes in 0.6 seconds (same as for any other radius).

Having linear implementation in image crate would help a lot. fastblur from @Shnatsel's link looks promising, but it would be nice to integrate it into image or imageproc instead of relying on a 3rd-party dependency that is somewhat harder to discover.

torfmaster · 2024-08-02T13:12:07Z

If anyone is interested: I created a first draft to resolve this issue in #2302 and I am willing to finalize this.

Shnatsel · 2024-10-07T11:17:31Z

FWIW a new crate with multiple fast blur implementations was just released: https://github.com/awxkee/libblur

It relies on unsafe SIMD intrinsics, so it might not be usable as-is for image due to the no-unsafe policy, but we may be able to uplift parts of it or perhaps reuse the fallback implementations.

awxkee · 2024-10-21T20:29:57Z

FWIW a new crate with multiple fast blur implementations was just released: https://github.com/awxkee/libblur

It relies on unsafe SIMD intrinsics, so it might not be usable as-is for image due to the no-unsafe policy, but we may be able to uplift parts of it or perhaps reuse the fallback implementations.

I can extract part of gaussian just for image crate with forbid unsafe if that makes sense.
I reworked convolution part without unsafe and situation looks something like that for RGB image 5000x4000:

fast_blur sigma 5: 793.836375ms
pure gaussian_blur 5 kernel size: 30.273625ms

fast_blur sigma 15: 803.334958ms
pure gaussian_blur 15 kernel size: 60.557ms

fast_blur sigma 35: 807.447583ms
pure gaussian_blur 35 kernel size: 133.914708ms

fast_blur sigma 151: 811.069792ms
pure gaussian_blur 151 kernel size: 495.628208ms

fast_blur sigma 251: 847.268125ms
pure gaussian_blur 151 kernel size: 875.879ms

pure gaussian_blur 251 kernel size with rayon: 125.079042ms

Pure gaussian blur parallelization works pretty fine so adding rayon instantly adds x10 performance, and multi-threading for blur's are preferred.

Perhaps I can also add stack blur that have O(1) in big O notation, which will be about ~x5-10 faster than blur based on CLT, which used here on fast_blur as far as I can tell.
Also blur's based on CLT have pretty high convergence with low controlling level, if you noticed that, and images became completely blurred sharply.

Here is stackblur time with disabled SIMD and multi-threading for the same image.

stackblur radius 151: 168.549ms

Even with forbid unsafe I wouldn't expect that it'll slowdown 8 times.

Shnatsel · 2024-10-21T21:07:41Z

Image's fast_blur was contributed very recently, and has not received a great deal of attention yet. I'd be happy to see a PR migrating it to a faster algorithm.

I am somewhat concerned about the growing implementation complexity, but I think we could take it on for common and performance-sensitive operations such as resizing and blurring.

awxkee · 2024-10-21T21:43:47Z

Everything of this actually sounds that it is pretty unreasonable :)

I do not think that blur is so common, might be spreaded a bit, but common I think this is not.

And for PR with this pixel store and organization and accessing them in the crate, it is the thing I'd like definitely stay away of that, and size of required code is not small, so complexity will grow noticable.

Shnatsel · 2024-10-21T22:00:11Z

Blur operation isn't that common but it usually does become a bottleneck whenever it crops up, so I think optimizing it is still worthwhile.

awxkee · 2024-10-21T22:06:16Z

May agree.

But issue with pixels storage, lack of really random access with iterator, lack of generic traits still an issue.

I may publish separate crate, but dealing with native problems of crate makes it immediately unreasonable.

Shnatsel · 2024-10-21T22:08:48Z

If you could list the exact problems with the crate's API that prevent this kind of algorithms from being written on top of it in #2358, that would be very helpful!

We're looking to improve the API and ship v1.0 sometime in the foreseeable future, so info on what exactly is missing right now would be very helpful.

awxkee · 2024-10-21T22:30:46Z

Raw row data must be fully accessible by each color component seperately or by set for ex. full RGBA ( this is optional, if each color component accesible this is completely fine ), with iterator without any checks and this call must be [inline(always)].

This can be done by continious allocated vector which is pretty common and very nice.

Or by brows, if necessary, it is very good sometimes, and sometimes are not, however, might confusing some who are not familiar with it. If y're not familiar it is an approach where instead of contiguous memory you can do gaps in layout and stores only reference to start ot the row, so your real image is a slice of row references &[&[T]].

This at least will make performant methods like this one and actually will prevent some people of doing bad things because someone, who starts, will search how to do so and will see this one.

For any real algorithms all generic traits must be implemented also. And all is a very vague definition because sometimes you'll need to make bitxor, or shift right, sometimes use fma when devices have it, and sometimes take a natural logarithm or get π from trait.

At the very least, all basic arithmetic must conform.

awxkee · 2024-10-21T22:35:59Z

The thing is that for convolution for example y'll have to access each pixel many, many, many times, for example for blurring with kernel size = 151, on each pixel in the image y'll have to access at least of 150+150 pixels around at very least, if you can't optimize this is out, this is the end.

Shnatsel · 2024-10-21T22:57:31Z

Yes, GenericImageView not allowing viewing rows is a known issue: #2300

Shnatsel · 2024-10-21T23:00:51Z

Thank you for the input! I've linked your message from the relevant issues on the bug tracker. We'll see what we can do about these limitations.

awxkee · 2024-10-21T23:28:13Z

Yup, if this reasonable, I still can do blur in separate crate.

Because all these limits can take years :)

Shnatsel · 2024-10-21T23:40:33Z

We're hoping to start a concerted effort on fixing the API sometime around January. But we won't know if they're any good until someone builds a complex image processing algorithm on top of them. Would you be available then to kick the tires of the new APIs and try to port stack blur to them?

awxkee · 2024-10-22T00:09:54Z

Yes, sure, when API looks like at least it worth trying, then I can try to do something on top of this.
I wouldn't say stack blur is complex, but ok :)

Let me know if I can help.

Aloxaf added the kind: bug label Jul 8, 2019

HeroicKatora added kind: enhancement medium and removed kind: bug labels Jul 8, 2019

fintelia added the help wanted label Mar 10, 2024

torfmaster mentioned this issue Aug 2, 2024

Fast Blur #2302

Merged

7 tasks

torfmaster mentioned this issue Aug 13, 2024

Add fast blur algorithm image-rs/imageproc#683

Closed

Shnatsel added the kind: slow Not wrong, but still unusable label Sep 14, 2024

Shnatsel mentioned this issue Oct 21, 2024

Better parallelism controls #2357

Open

This was referenced Oct 21, 2024

Writing code generic over pixel type is hard #2358

Open

Implement a rows() iterator on SubImage and GenericImageView to unlock better performance #2300

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`blur` function is too slow #986

`blur` function is too slow #986

Aloxaf commented Jul 8, 2019

HeroicKatora commented Jul 8, 2019

Aloxaf commented Jul 8, 2019 •

edited

Loading

theotherphil commented Jul 8, 2019 •

edited

Loading

theotherphil commented Jul 8, 2019

Shnatsel commented Oct 2, 2019

vadixidav commented Apr 2, 2020

RReverser commented Mar 16, 2021 •

edited

Loading

torfmaster commented Aug 2, 2024

Shnatsel commented Oct 7, 2024 •

edited

Loading

awxkee commented Oct 21, 2024

Shnatsel commented Oct 21, 2024

awxkee commented Oct 21, 2024

Shnatsel commented Oct 21, 2024

awxkee commented Oct 21, 2024

Shnatsel commented Oct 21, 2024

awxkee commented Oct 21, 2024

awxkee commented Oct 21, 2024

Shnatsel commented Oct 21, 2024

Shnatsel commented Oct 21, 2024

awxkee commented Oct 21, 2024

Shnatsel commented Oct 21, 2024

awxkee commented Oct 22, 2024

blur function is too slow #986

blur function is too slow #986

Comments

Aloxaf commented Jul 8, 2019

Reproduction steps

HeroicKatora commented Jul 8, 2019

Aloxaf commented Jul 8, 2019 • edited Loading

theotherphil commented Jul 8, 2019 • edited Loading

theotherphil commented Jul 8, 2019

Shnatsel commented Oct 2, 2019

vadixidav commented Apr 2, 2020

RReverser commented Mar 16, 2021 • edited Loading

torfmaster commented Aug 2, 2024

Shnatsel commented Oct 7, 2024 • edited Loading

awxkee commented Oct 21, 2024

Shnatsel commented Oct 21, 2024

awxkee commented Oct 21, 2024

Shnatsel commented Oct 21, 2024

awxkee commented Oct 21, 2024

Shnatsel commented Oct 21, 2024

awxkee commented Oct 21, 2024

awxkee commented Oct 21, 2024

Shnatsel commented Oct 21, 2024

Shnatsel commented Oct 21, 2024

awxkee commented Oct 21, 2024

Shnatsel commented Oct 21, 2024

awxkee commented Oct 22, 2024

`blur` function is too slow #986

`blur` function is too slow #986

Aloxaf commented Jul 8, 2019 •

edited

Loading

theotherphil commented Jul 8, 2019 •

edited

Loading

RReverser commented Mar 16, 2021 •

edited

Loading

Shnatsel commented Oct 7, 2024 •

edited

Loading