Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizing the holes compared with Intel SSE in scale module for ARM #406

Open
GoogleCodeExporter opened this issue Mar 25, 2015 · 8 comments

Comments

@GoogleCodeExporter
Copy link

In Issue 319: 64 bit ARMv8 support for libyuv, that there are holes compared 
with Intel SSE in scale module for ARM platform. 

The following functions need to be implemented with ARM NEON:
ScaleRowDown2Linear
ScaleAddRows
ScaleFilterCols
ScaleColsUp2
ScaleARGBRowDown2Linear
ScaleARGBCols
ScaleARGBColsUp2
ScaleARGBFilterCols

Original issue reported on code.google.com by [email protected] on 25 Feb 2015 at 9:17

@GoogleCodeExporter
Copy link
Author

I have completed three functions for ARM32/64 as follows:
ScaleRowDown2Linear
ScaleAddRows
ScaleARGBRowDown2Linear

But for other 5 functions:
3 functions (ScaleFilterCols, ScaleARGBCols and ScaleARGBFilterCols) are not 
suitable for NEON SIMD.
2 functions (ScaleColsUp2 and ScaleARGBColsUp2) are not caught by test cases.

So that I want to know whether it is necessary to implement these 5 functions 
with ARM NOEN?


Original comment by [email protected] on 25 Feb 2015 at 9:23

  • Added labels: ****
  • Removed labels: ****

@GoogleCodeExporter
Copy link
Author

Original comment by [email protected] on 25 Feb 2015 at 9:25

  • Added labels: ****
  • Removed labels: ****

@GoogleCodeExporter
Copy link
Author

ScaleFilterCols and ScaleARGBFilterCols give about 3x performance for general 
purpose bilinear filtering on x86.

FYI On Intel there appears to be a rounding issue, that I'll be first 
attempting to repro with a unittest, and then tweaking how the filtering works.

Original comment by [email protected] on 25 Feb 2015 at 8:38

  • Added labels: ****
  • Removed labels: ****

@GoogleCodeExporter
Copy link
Author

After checking SSE version of ScaleFilterCols, it looks like that two sets of 
data are processed in one loop. But for NEON, loop unrolling with two isn't so 
efficient.

I will try loop unrolling with bigger size such as 8 based on different dx 
varible.

Original comment by [email protected] on 26 Feb 2015 at 8:05

  • Added labels: ****
  • Removed labels: ****

@GoogleCodeExporter
Copy link
Author

Conceptually the filter columns attempts to be like filter rows, but first 
rearranges the data so adjacent pixels get put into different registers. 

Original comment by [email protected] on 26 Feb 2015 at 6:40

  • Added labels: ****
  • Removed labels: ****

@GoogleCodeExporter
Copy link
Author

See bug 407 for a new function: ARGBToRGB565Dither()
Before shifting 8 bit RGB values down to 5 or 6 bits, add values from the 
dither matrix:

// Ordered 4x4 dither for 888 to 565.  Values from 0 to 7.
static const uint8 kDither565_4x4[16] = {
  0, 4, 1, 5,
  6, 2, 7, 3,
  1, 5, 0, 4,
  7, 3, 6, 2,
};

Do you have time to adapt the ARGBToRGB565 to add dither support?

Original comment by [email protected] on 10 Mar 2015 at 10:58

  • Added labels: ****
  • Removed labels: ****

@GoogleCodeExporter
Copy link
Author

Do you mean if I can add NEON support for ARGBToRGB565Dither?

Currently, I'm working on
1. ScaleFilterCols
About 2x improvement. The patch is ready, depending on the review of patch 
ScaleAddRows

2. ScaleARGBCols
About 1.1x improvement. The patch is ready.

3. ScaleARGBFilterCols
In progress.

When I complete these patches. I think I can handle this function.

Original comment by [email protected] on 11 Mar 2015 at 6:11

  • Added labels: ****
  • Removed labels: ****

@GoogleCodeExporter
Copy link
Author

I have completed the patch of ARGBToRGB565Dither for ARM32/64 NEON.
Please check:
https://webrtc-codereview.appspot.com/49409004/

Original comment by [email protected] on 16 Mar 2015 at 6:25

  • Added labels: ****
  • Removed labels: ****

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant