-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Image filter methods, size of output #106
Comments
Some design decision is needed there, I also noticed an inconsistency while making this example : #92 It's not clear what need to be followed, the original filters output a smaller image, but the sobel filter doesn't. Users will expect an output of the same size, but it's not trivial (specially for large radius) :
In hi-quality filters what is done is : It make sense on x86, but might be inefficient on simple processor with no cache. |
I think that the low-level methods should only focus on the fastest/lightest code, and so output a smaller image. Example : Pal can propose additional higher level functions for the filters, that output the expected image size, by internally pre-resizing the input and call the lower level function. |
Agree 100%
|
@anael-seghezzi: I think you're wrong. It can't be faster to create another input buffer (whose size depends on the number of filters to be executed!!!), when most current systems have the memory bus as a bottleneck. A better choice here would be to compute some garbage values for the pixels that are on the border, and offer a post-processing function for those who also need the border to be correct ( and offer a number of possibilities in extending the image by 1 - clone neighbor, fill with value, and such). |
I agree with @anael-seghezzi. Filtering done on certain nodes should be as fast as possible. That implies no extraordinary possibilities (border pixels, post-processing, etc.). |
Thomas,
|
Andreas, The currently implemented image processing algorithms treat the output as a vector, and after computing 1 element, they advance by 1 element size in memory. This means that a simple 2 pixel enlargement won't suffice for running multiple filters, since they lack the understanding of the line stride, and will just output a smaller image, contiguously in memory. So either after each filter you resize, or you transfer the image with the padding logic encoded ( the DMA controller will probably love this, I foresee alignment issues here, or byte as transfer size). And additionally, before each DMA transfer you need to zero the buffer in order to have the correct values in the padding pixels. @mateunho Nope. My way of thinking also takes into consideration some restrictions on the hardware side, it doesn't only look at the algorithm to be implemented. |
@tcptomato Can you clarify what the problem is? Can you clarify further what problem you are concerned about? Please keep in mind that these low level functions are not meant to be "perfect functions". Perhaps some of your concerns could be addressed in a higher level function? Let's take the following use case: Please describe your proposal for addressing the above example. |
Assumption: For your example: For the next issue allow me to replace Sobel by the Gaussian blur ( since it's more likely to be ran multiple times on the same image, and has the same halo issue). In order to apply the blur filter 2 times on the image, after the first run the halo needs to be recreated either by transferring the image to DRAM and copying back with the stride ( or does the DMA controller support local to local transfer ? even if it does it's another memory transfer), or the local processor needs to add it. To recap, biggest bottleneck here is DMA running at 1/4 speed and needing to prepare the buffer for each new run. How I see it. Both options discussed here have issues in stitching back the tiles, but we'll discuss them when we get there. |
Please explain assumption. How can we ignore border artifacts? On Mon, Jun 15, 2015 at 1:37 PM, Thomas Böhm [email protected]
Andreas Olofsson, CEO/Founder at Adapteva Linkedin: linkedin.com/in/andreasolofsson |
The artifact appears for the border pixels, who need values in other slices to be computed correctly. Ignoring this artifact ( for the current discussion), we use the value in the halo ( in your case ), or some other one ( in mine). |
Well, there is definitively some taught to have about memory, I honestly have no good insight about paralella memory transfer cost vs computation power. What I know (I tested it on intel), is that, in the case of a "correct" gaussian filter (meaning it will output clean filtered border) it is slower to take care of the border inside the processing loop than to prepare an enlarged image and then execute a simpler processing loop. Also, tile-based filtering will have to make a copy of the input data anyway, and will have to be larger than the output tile also anyway (or the tile-borders will not match each other). |
Do you have by any chance the code pushed somewhere, so I can extend it and we can talk about the same thing ? I didn't say to take care of the border in the processing loop ( that would be counterproductive). If correct borders are needed, a helper function could correct them. If not, processing is done. |
The "intel" code ? It's on my github account > https://github.com/anael-seghezzi/Maratis-Tiny-C-library I think it's better to have correctness, even if the resulting image is smaller, in my view, if the output has to be the same size as input, the corner has to be correct. Also, I don't see how a helper function can simply correct a border in a tile-based system, because you will still need the extra input data to correct the border of the tile. That said, I still agree that the memory transfer on the platforms targeted by pal needs to be optimal, I don't have a parallela so I'm not sure about that. |
The helper function is for the border of the complete frame, where you have to assign values to the non existent pixels when computing the value of the pixels on the border. OpenCV has here 5 different options ( reflect, constant, wrap , ... ), and I was saying that the default should be don't care ( but due to the way it's implemented in pointer arithmetic it will be a kind of wrap for most of the image ). And the way I was proposing to split the image, you would have to bring 2 more lines ( 256 pixels ) at full speed. For the tile-based one you need 132 more pixels, but it will be at 1/4th of the speed due to memory alignment. |
If you are saying that transferring horizontal blocs (128x8) aligned in memory will remove a bottleneck on this platform versus a tile-based system (32x32), it should be considered and tested. I'm not sure how this will react with larger radius where you cannot parse the radius-source pixels continuously. But larger radius have other winning strategies, large blur radius are usually processed in two passes horizontal blur then vertical blur, how is this regarding memory transfert on paralella ? |
I'm saying a contiguous block transfer will remove a bottleneck that a halo based approach would incur. And I'm not sure I understand what you say by not being able to parse the source pixels continuously. |
I mean that, the way you propose to split the image, if we have to deal with a larger blur radius, or if the image width is very large, you won't be able to fit the extra data in memory if you want contiguous pixels. Or maybe I misunderstood a step. How do you deal with a large image width for example ? You will have to tile the image anyway no ? |
An interesting link about this subject : https://www.youtube.com/watch?t=549&v=3uiEyEKji0M |
In OpenCV, the output image is the same size as the input image. The indices for x,y in the output image corresponds to the x,y in the input.
But in Pal, the 3x3 methods have output sizes that are two pixels smaller in width and height than the input size. This is more economical on memory but does not have corresponding x,y locations. Is this the preferred method?
The size of the output images should be added to the comments and documentation for these methods.
The text was updated successfully, but these errors were encountered: