Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ScalePlaneDown2 should be efficient for odd widths #314

Open
GoogleCodeExporter opened this issue Mar 25, 2015 · 3 comments
Open

ScalePlaneDown2 should be efficient for odd widths #314

GoogleCodeExporter opened this issue Mar 25, 2015 · 3 comments

Comments

@GoogleCodeExporter
Copy link

Especially for arm vs. NEON, it would be significantly cheaper to do 
ScaleRowDown2 with a properly aligned dst_stride (and operate over junk pixels) 
than to limit exactly to an unaligned dst_width. A caller that knew that could 
lie about the dst_width and set it to dst_stride in calling the function, but 
that shouldn't be necessary.

Original issue reported on code.google.com by [email protected] on 14 Feb 2014 at 7:54

@GoogleCodeExporter
Copy link
Author

To know if a libyuv function will be fast path, a rule of thumb is the width 
needs to be a multiple of 16, and the image pointer/stride should also be 
aligned to 16.
The pointer/stride alignment is less of a concern on Neon and AVX2, but most 
functions are optimized for aligned width.

Your suggested solution is overread/overwrite.  libyuv will allow you to do 
that, and its a good solution.
Allocate extra pad bytes for rows and/or images.
Conversion functions will check if width == stride and treat the image as one 
large row.  You can do that yourself, and pad out the total (width * height + 
15) & ~15;

Scaling can't be row coalesced, but you can allocate aligned rows.
Allocate buffers with stride = (width + 15) & ~15; and image_size = stride * 
height;
In the case of scaling to 1/2, the destination needs to be a multiple of 16, so 
source would be a multiple of 32.

I did experiment with overreads/overwrites, but it was deemed unsafe.  So the 2 
solutions I've come up with are 'any' functions, and row coalescing.
any functions on intel still prefer an aligned pointer, but handle 'any' width, 
by doing the multiple of 16, and then handling the remainder.  Most handle the 
remainder using C code, but some functions redo work on the 'last16' pixels, 
which is an overread/overwrite of data already processed, but within the row.
This is supported for conversions, but not scaling.

The unittests check for overread/write by allocating images at the end of a 
page, and are run thru valgrind.

So the action item here is to implement scale_any.cc which has a wrapper for 
each scale row function that handles odd sizes.
Its not hard, and it may even exist already for 1/2 size, since that comes up 
in conversions/effects.

Original comment by [email protected] on 21 Feb 2014 at 11:27

  • Added labels: ****
  • Removed labels: ****

@GoogleCodeExporter
Copy link
Author

Best long term solution will be allow pointers to be unaligned - albiet slower, 
and allow width and/or stride to be 'any'.

Another user suggested an 'overread' mode, which was tried in the past.  Its 
efficient but dangerous, so the 'any' approach is preferred.
Also row coalescing was added to allow contiguous images to be handled 
efficiently.

Changing nature of this bug to efficient odd width scaling support.

Original comment by [email protected] on 28 Jul 2014 at 10:01

  • Changed title: ScalePlaneDown2 should be efficient for odd widths
  • Added labels: ****
  • Removed labels: ****

@GoogleCodeExporter
Copy link
Author

ScalePlaneDown2 is also the highest on profiles for the scaler, and should be 
AVX2 optimized.

Original comment by [email protected] on 27 Nov 2014 at 1:49

  • Added labels: ****
  • Removed labels: ****

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant