ScalePlaneDown2 should be efficient for odd widths #314

GoogleCodeExporter · 2015-03-25T11:06:39Z

Especially for arm vs. NEON, it would be significantly cheaper to do 
ScaleRowDown2 with a properly aligned dst_stride (and operate over junk pixels) 
than to limit exactly to an unaligned dst_width. A caller that knew that could 
lie about the dst_width and set it to dst_stride in calling the function, but 
that shouldn't be necessary.

Original issue reported on code.google.com by [email protected] on 14 Feb 2014 at 7:54

The text was updated successfully, but these errors were encountered:

GoogleCodeExporter · 2015-03-25T11:06:39Z

To know if a libyuv function will be fast path, a rule of thumb is the width 
needs to be a multiple of 16, and the image pointer/stride should also be 
aligned to 16.
The pointer/stride alignment is less of a concern on Neon and AVX2, but most 
functions are optimized for aligned width.

Your suggested solution is overread/overwrite.  libyuv will allow you to do 
that, and its a good solution.
Allocate extra pad bytes for rows and/or images.
Conversion functions will check if width == stride and treat the image as one 
large row.  You can do that yourself, and pad out the total (width * height + 
15) & ~15;

Scaling can't be row coalesced, but you can allocate aligned rows.
Allocate buffers with stride = (width + 15) & ~15; and image_size = stride * 
height;
In the case of scaling to 1/2, the destination needs to be a multiple of 16, so 
source would be a multiple of 32.

I did experiment with overreads/overwrites, but it was deemed unsafe.  So the 2 
solutions I've come up with are 'any' functions, and row coalescing.
any functions on intel still prefer an aligned pointer, but handle 'any' width, 
by doing the multiple of 16, and then handling the remainder.  Most handle the 
remainder using C code, but some functions redo work on the 'last16' pixels, 
which is an overread/overwrite of data already processed, but within the row.
This is supported for conversions, but not scaling.

The unittests check for overread/write by allocating images at the end of a 
page, and are run thru valgrind.

So the action item here is to implement scale_any.cc which has a wrapper for 
each scale row function that handles odd sizes.
Its not hard, and it may even exist already for 1/2 size, since that comes up 
in conversions/effects.

Original comment by [email protected] on 21 Feb 2014 at 11:27

Added labels: ****
Removed labels: ****

GoogleCodeExporter · 2015-03-25T11:06:39Z

Best long term solution will be allow pointers to be unaligned - albiet slower, 
and allow width and/or stride to be 'any'.

Another user suggested an 'overread' mode, which was tried in the past.  Its 
efficient but dangerous, so the 'any' approach is preferred.
Also row coalescing was added to allow contiguous images to be handled 
efficiently.

Changing nature of this bug to efficient odd width scaling support.

Original comment by [email protected] on 28 Jul 2014 at 10:01

Changed title: ScalePlaneDown2 should be efficient for odd widths
Added labels: ****
Removed labels: ****

GoogleCodeExporter · 2015-03-25T11:06:40Z

ScalePlaneDown2 is also the highest on profiles for the scaler, and should be 
AVX2 optimized.

Original comment by [email protected] on 27 Nov 2014 at 1:49

Added labels: ****
Removed labels: ****

GoogleCodeExporter added Priority-Medium Type-Defect auto-migrated labels Mar 25, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ScalePlaneDown2 should be efficient for odd widths #314

ScalePlaneDown2 should be efficient for odd widths #314

GoogleCodeExporter commented Mar 25, 2015

GoogleCodeExporter commented Mar 25, 2015

GoogleCodeExporter commented Mar 25, 2015

GoogleCodeExporter commented Mar 25, 2015

ScalePlaneDown2 should be efficient for odd widths #314

ScalePlaneDown2 should be efficient for odd widths #314

Comments

GoogleCodeExporter commented Mar 25, 2015

GoogleCodeExporter commented Mar 25, 2015

GoogleCodeExporter commented Mar 25, 2015

GoogleCodeExporter commented Mar 25, 2015