-
Notifications
You must be signed in to change notification settings - Fork 624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
blur
function is too slow
#986
Comments
Just to clarify, PIL uses a C version indirectly https://github.com/python-pillow/Pillow/blob/ab9a25d623fdd7f8de3e724b538f5660eac589ae/src/libImaging/BoxBlur.c#L294 It's still somewhat embarrasingly slow. |
@HeroicKatora Yes, maybe it would be a good idea to port it rust? |
Pillow appears to use (a variant on) iterated box filtering, as defined in http://www.mia.uni-saarland.de/Publications/gwosdek-ssvm11.pdf. There's an open imageproc ticket to implement this: image-rs/imageproc#93 |
I'll have a look at implementing this over the coming weekend. |
There is actually a linear-complexity blur already implemented in Rust, where execution time is independent of blur radius: https://github.com/fschutt/fastblur A copy of it was copied into resvg codebase and polished to match Chrome blur. You can find it in this file, search for |
You may want to take a look at the code here. It was originally based on a different paper, written by the author of AKAZE, ported to rust by someone else, and then modified later by me (so it has been touched by many hands). It makes use of the following papers:
It is actually not used for guassian blur in akaze itself. I am also not 100% sure it can help speed up guassian blur, but I think it can based on what I have seen in its use in akaze. I am no expert. The way this algorithm works is that it determines a number of diffusion steps that start small and grow bigger as the stability increases (which may not apply to guassian blur?). See this file for how the specialized blur is being applied (with ndarray). I was able to get pretty good speed by writing my filters with ndarray like that. My situation is a bit more specific to this library, but the code is free for you to take. |
Also noticed this while comparing Wasm version of At high radius values the difference becomes ridiculous - e.g. for 3872x2592 image and radius 300px the image crate's version executes in 94 seconds, while JS executes in 0.6 seconds (same as for any other radius). Having linear implementation in image crate would help a lot. |
If anyone is interested: I created a first draft to resolve this issue in #2302 and I am willing to finalize this. |
FWIW a new crate with multiple fast blur implementations was just released: https://github.com/awxkee/libblur It relies on unsafe SIMD intrinsics, so it might not be usable as-is for |
I can extract part of gaussian just for
Pure gaussian blur parallelization works pretty fine so adding Perhaps I can also add stack blur that have O(1) in big O notation, which will be about ~x5-10 faster than blur based on CLT, which used here on Here is stackblur time with disabled SIMD and multi-threading for the same image.
Even with |
Image's I am somewhat concerned about the growing implementation complexity, but I think we could take it on for common and performance-sensitive operations such as resizing and blurring. |
Everything of this actually sounds that it is pretty unreasonable :) I do not think that blur is so common, might be spreaded a bit, but common I think this is not. And for PR with this pixel store and organization and accessing them in the crate, it is the thing I'd like definitely stay away of that, and size of required code is not small, so complexity will grow noticable. |
Blur operation isn't that common but it usually does become a bottleneck whenever it crops up, so I think optimizing it is still worthwhile. |
May agree. But issue with pixels storage, lack of really random access with iterator, lack of generic traits still an issue. I may publish separate crate, but dealing with native problems of crate makes it immediately unreasonable. |
If you could list the exact problems with the crate's API that prevent this kind of algorithms from being written on top of it in #2358, that would be very helpful! We're looking to improve the API and ship v1.0 sometime in the foreseeable future, so info on what exactly is missing right now would be very helpful. |
Raw row data must be fully accessible by each color component seperately or by set for ex. full RGBA ( this is optional, if each color component accesible this is completely fine ), with iterator without any checks and this call must be This can be done by continious allocated vector which is pretty common and very nice. Or by This at least will make performant methods like this one and actually will prevent some people of doing bad things because someone, who starts, will search how to do so and will see this one. For any real algorithms all generic traits must be implemented also. And At the very least, all basic arithmetic must conform. |
The thing is that for convolution for example y'll have to access each pixel many, many, many times, for example for blurring with kernel size = 151, on each pixel in the image y'll have to access at least of 150+150 pixels around at very least, if you can't optimize this is out, this is the end. |
Yes, |
Thank you for the input! I've linked your message from the relevant issues on the bug tracker. We'll see what we can do about these limitations. |
Yup, if this reasonable, I still can do blur in separate crate. Because all these limits can take years :) |
We're hoping to start a concerted effort on fixing the API sometime around January. But we won't know if they're any good until someone builds a complex image processing algorithm on top of them. Would you be available then to kick the tires of the new APIs and try to port stack blur to them? |
Yes, sure, when API looks like at least it worth trying, then I can try to do something on top of this. Let me know if I can help. |
blur
is too slow,imageproc::filter::gaussian_blur_f32
is faster than it, but still slow.Reproduction steps
result is
and Pillow
result is
0.1433885097503662s
Here is the final image
rust_1.png (it's a little strange...
rust_2.png
python.png
The text was updated successfully, but these errors were encountered: