Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorize (AVX2) JPEG Color Converter #1411

Merged
merged 14 commits into from
Nov 7, 2020
Merged

Vectorize (AVX2) JPEG Color Converter #1411

merged 14 commits into from
Nov 7, 2020

Conversation

tkp1n
Copy link
Contributor

@tkp1n tkp1n commented Nov 3, 2020

This is an initial implementation of the remaining (not yet vectorized) JPEG color converters using AVX(2) runtime intrinsics as well as Vector<float>.

I'm opening this as a draft PR to get initial feedback before investing additional time in polishing, tests and benchmarks.

See also #809

@CLAassistant
Copy link

CLAassistant commented Nov 3, 2020

CLA assistant check
All committers have signed the CLA.

Copy link
Member

@antonfirsov antonfirsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EPIC!

Unsafe.Add(ref destination, 2) = Avx.Shuffle(cmHi, yoHi, 0b01_00_01_00);
Unsafe.Add(ref destination, 3) = Avx.Shuffle(cmHi, yoHi, 0b11_10_11_10);
}
#else
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't have chance to react, but I'm not entirely happy with the practice established in #1402. I would try putting the different paths into different classes (FromCmykVector8 VS FromCmykVectorAvx2).
FromCmykVectorAvx2 can derive from FromCmykVector8 for convenience.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that was a stopgap. I’d like to revisit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't shy away from inheritance here, I would suggest a hierarchy like this:

abstract class JpegColorConverter
{
    public abstract bool IsAvailable { get; } // new on this class
}

abstract class BasicJpegColorConverter : JpegColorConverter // new base class for all non-vectorized color converters
{
    public override bool IsAvailable => true;
}

abstract class VectorizedJpegColorConverter : JpegColorConverter // new base class for all vectorized color converters
{
    private readonly int vectorSize;

    protected VectorizedJpegColorConverter(JpegColorSpace colorSpace, int precision, int vectorSize)
        : base(colorSpace, precision)
    {
        this.vectorSize = vectorSize;
    }

    public override void ConvertToRgba(in ComponentValues values, Span<Vector4> result)
    {
        int remainder = result.Length % vectorSize;
        int simdCount = result.Length - remainder;
        if (simdCount > 0)
        {
            ConvertCoreVectorized(values.Slice(0, simdCount), result.Slice(0, simdCount));
        }

        ConvertCore(values.Slice(simdCount, remainder), result.Slice(simdCount, remainder));
    }

    protected abstract void ConvertCoreVectorized(in ComponentValues values, Span<Vector4> result);

    protected abstract void ConvertCore(in ComponentValues values, Span<Vector4> result);
}

abstract class Avx2JpegColorConverter : VectorizedJpegColorConverter // new base class for all AVX2-based converters
{
    protected Avx2JpegColorConverter(JpegColorSpace colorSpace, int precision)
        : base(colorSpace, precision, 8)
    {
    }

    public override bool IsAvailable
    {
        get
        {
#if SUPPORTS_RUNTIME_INTRINSICS
            return Avx2.IsSupported;
#else
            return false;
#endif
        }
    }
}

// another base class for Vector<float>-based converters

We could then instantiate all converters initially into a static array and determine which one to use based on ColorSpace, Precision and IsAvailable. This would address your point and maximize code re-use at the cost of some virtual method calls.

Let me know what you think. I didn't want to refactor to much for the initial draft-PR and just add the vectorized logic into the existing code structure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tkp1n I like the concept, go for it! Hopefully the runtime costs of the virtual dispatches are be still negligible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored as discussed above, PTAL...

Next up: Benchmarks 🚀

@codecov
Copy link

codecov bot commented Nov 4, 2020

Codecov Report

Merging #1411 (0aa3ba5) into master (9f51a92) will decrease coverage by 0.05%.
The diff coverage is 90.54%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1411      +/-   ##
==========================================
- Coverage   83.14%   83.08%   -0.06%     
==========================================
  Files         695      707      +12     
  Lines       31484    31839     +355     
  Branches     3586     3590       +4     
==========================================
+ Hits        26176    26454     +278     
- Misses       4585     4668      +83     
+ Partials      723      717       -6     
Flag Coverage Δ
unittests 83.08% <90.54%> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...lorConverters/JpegColorConverter.FromYCbCrBasic.cs 100.00% <ø> (ø)
...rConverters/JpegColorConverter.FromYCbCrVector4.cs 7.14% <25.00%> (ø)
...ents/Decoder/ColorConverters/JpegColorConverter.cs 74.82% <52.85%> (-18.37%) ⬇️
src/ImageSharp/Common/Helpers/SimdUtils.cs 65.95% <66.66%> (+0.04%) ⬆️
...JpegColorConverter.VectorizedJpegColorConverter.cs 75.00% <75.00%> (ø)
...rters/JpegColorConverter.Avx2JpegColorConverter.cs 100.00% <100.00%> (ø)
...ters/JpegColorConverter.BasicJpegColorConverter.cs 100.00% <100.00%> (ø)
...ColorConverters/JpegColorConverter.FromCmykAvx2.cs 100.00% <100.00%> (ø)
...olorConverters/JpegColorConverter.FromCmykBasic.cs 100.00% <100.00%> (ø)
...orConverters/JpegColorConverter.FromCmykVector8.cs 100.00% <100.00%> (ø)
... and 27 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9f51a92...0aa3ba5. Read the comment docs.

@tkp1n
Copy link
Contributor Author

tkp1n commented Nov 4, 2020

The numbers are in 💯 Looks like some nice improvements across the board.

👍 YCCK is the big winner.
👎 Vectorizing the gray scale conversion does not seem profitable.

BenchmarkDotNet=v0.12.1, OS=Windows 10.0.19042
Intel Core i7-9750H CPU 2.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.100-rc.2.20479.15
  [Host]     : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT
  Job-ZMVTJV : .NET Framework 4.8 (4.8.4250.0), X64 RyuJIT
  Job-VQWAVA : .NET Core 2.1.23 (CoreCLR 4.6.29321.03, CoreFX 4.6.29321.01), X64 RyuJIT
  Job-ETAOPA : .NET Core 3.1.9 (CoreCLR 4.700.20.47201, CoreFX 4.700.20.47203), X64 RyuJIT

IterationCount=3  LaunchCount=1  WarmupCount=3  

CMYK

Method Job Runtime Mean Error StdDev Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
Scalar Job-ZMVTJV .NET 4.7.2 790.7 ns 373.12 ns 20.45 ns 1.00 0.00 0.0048 - - 32 B
SimdVector8 Job-ZMVTJV .NET 4.7.2 441.6 ns 69.54 ns 3.81 ns 0.56 0.01 0.0062 - - 40 B
SimdVectorAvx2 Job-ZMVTJV .NET 4.7.2 NA NA NA ? ? - - - -
Scalar Job-VQWAVA .NET Core 2.1 376.8 ns 125.78 ns 6.89 ns 1.00 0.00 0.0048 - - 32 B
SimdVector8 Job-VQWAVA .NET Core 2.1 320.1 ns 18.45 ns 1.01 ns 0.85 0.01 0.0062 - - 40 B
SimdVectorAvx2 Job-VQWAVA .NET Core 2.1 NA NA NA ? ? - - - -
Scalar Job-ETAOPA .NET Core 3.1 395.4 ns 89.28 ns 4.89 ns 1.00 0.00 0.0048 - - 32 B
SimdVector8 Job-ETAOPA .NET Core 3.1 303.2 ns 28.49 ns 1.56 ns 0.77 0.01 0.0062 - - 40 B
SimdVectorAvx2 Job-ETAOPA .NET Core 3.1 195.4 ns 18.34 ns 1.01 ns 0.49 0.01 0.0062 - - 40 B

YCCK

Method Job Runtime Mean Error StdDev Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
Scalar Job-ZMVTJV .NET 4.7.2 6,590.0 ns 2,572.12 ns 140.99 ns 1.00 0.00 - - - 32 B
SimdVector8 Job-ZMVTJV .NET 4.7.2 484.9 ns 177.17 ns 9.71 ns 0.07 0.00 0.0057 - - 40 B
SimdVectorAvx2 Job-ZMVTJV .NET 4.7.2 NA NA NA ? ? - - - -
Scalar Job-VQWAVA .NET Core 2.1 4,880.5 ns 296.83 ns 16.27 ns 1.00 0.00 - - - 32 B
SimdVector8 Job-VQWAVA .NET Core 2.1 380.2 ns 162.87 ns 8.93 ns 0.08 0.00 0.0062 - - 40 B
SimdVectorAvx2 Job-VQWAVA .NET Core 2.1 NA NA NA ? ? - - - -
Scalar Job-ETAOPA .NET Core 3.1 3,655.0 ns 966.89 ns 53.00 ns 1.00 0.00 0.0038 - - 32 B
SimdVector8 Job-ETAOPA .NET Core 3.1 346.4 ns 21.49 ns 1.18 ns 0.09 0.00 0.0062 - - 40 B
SimdVectorAvx2 Job-ETAOPA .NET Core 3.1 229.5 ns 55.78 ns 3.06 ns 0.06 0.00 0.0062 - - 40 B

RGB

Method Job Runtime Mean Error StdDev Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
Scalar Job-ZMVTJV .NET 4.7.2 507.3 ns 47.95 ns 2.63 ns 1.00 0.00 0.0048 - - 32 B
SimdVector8 Job-ZMVTJV .NET 4.7.2 386.7 ns 217.45 ns 11.92 ns 0.76 0.02 0.0062 - - 40 B
SimdVectorAvx2 Job-ZMVTJV .NET 4.7.2 NA NA NA ? ? - - - -
Scalar Job-VQWAVA .NET Core 2.1 298.8 ns 43.56 ns 2.39 ns 1.00 0.00 0.0048 - - 32 B
SimdVector8 Job-VQWAVA .NET Core 2.1 297.8 ns 53.72 ns 2.94 ns 1.00 0.01 0.0062 - - 40 B
SimdVectorAvx2 Job-VQWAVA .NET Core 2.1 NA NA NA ? ? - - - -
Scalar Job-ETAOPA .NET Core 3.1 293.0 ns 28.07 ns 1.54 ns 1.00 0.00 0.0048 - - 32 B
SimdVector8 Job-ETAOPA .NET Core 3.1 287.9 ns 22.87 ns 1.25 ns 0.98 0.01 0.0062 - - 40 B
SimdVectorAvx2 Job-ETAOPA .NET Core 3.1 178.3 ns 155.38 ns 8.52 ns 0.61 0.03 0.0062 - - 40 B

Grayscale

Method Job Runtime Mean Error StdDev Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
Scalar Job-ZMVTJV .NET 4.7.2 212.1 ns 70.83 ns 3.88 ns 1.00 0.00 0.0050 - - 32 B
SimdVector8 Job-ZMVTJV .NET 4.7.2 321.9 ns 32.69 ns 1.79 ns 1.52 0.03 0.0062 - - 40 B
SimdVectorAvx2 Job-ZMVTJV .NET 4.7.2 NA NA NA ? ? - - - -
Scalar Job-VQWAVA .NET Core 2.1 215.7 ns 81.12 ns 4.45 ns 1.00 0.00 0.0050 - - 32 B
SimdVector8 Job-VQWAVA .NET Core 2.1 258.6 ns 68.91 ns 3.78 ns 1.20 0.02 0.0062 - - 40 B
SimdVectorAvx2 Job-VQWAVA .NET Core 2.1 NA NA NA ? ? - - - -
Scalar Job-ETAOPA .NET Core 3.1 145.3 ns 28.68 ns 1.57 ns 1.00 0.00 0.0050 - - 32 B
SimdVector8 Job-ETAOPA .NET Core 3.1 249.5 ns 9.49 ns 0.52 ns 1.72 0.02 0.0062 - - 40 B
SimdVectorAvx2 Job-ETAOPA .NET Core 3.1 141.3 ns 2.46 ns 0.13 ns 0.97 0.01 0.0062 - - 40 B

YCbCr

Method Job Runtime Mean Error StdDev Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
Scalar Job-ZMVTJV .NET 4.7.2 6,087.3 ns 1,964.34 ns 107.67 ns 9.08 0.24 - - - 32 B
SimdVector Job-ZMVTJV .NET 4.7.2 670.4 ns 121.26 ns 6.65 ns 1.00 0.00 0.0057 - - 40 B
SimdVector8 Job-ZMVTJV .NET 4.7.2 430.5 ns 216.63 ns 11.87 ns 0.64 0.02 0.0062 - - 40 B
SimdVectorAvx2 Job-ZMVTJV .NET 4.7.2 NA NA NA ? ? - - - -
Scalar Job-VQWAVA .NET Core 2.1 4,533.9 ns 1,935.28 ns 106.08 ns 7.83 0.22 - - - 32 B
SimdVector Job-VQWAVA .NET Core 2.1 579.2 ns 79.18 ns 4.34 ns 1.00 0.00 0.0057 - - 40 B
SimdVector8 Job-VQWAVA .NET Core 2.1 338.5 ns 45.89 ns 2.52 ns 0.58 0.01 0.0062 - - 40 B
SimdVectorAvx2 Job-VQWAVA .NET Core 2.1 NA NA NA ? ? - - - -
Scalar Job-ETAOPA .NET Core 3.1 3,442.3 ns 465.54 ns 25.52 ns 7.68 0.06 0.0038 - - 32 B
SimdVector Job-ETAOPA .NET Core 3.1 448.4 ns 116.31 ns 6.38 ns 1.00 0.00 0.0062 - - 40 B
SimdVector8 Job-ETAOPA .NET Core 3.1 323.5 ns 81.56 ns 4.47 ns 0.72 0.02 0.0062 - - 40 B
SimdVectorAvx2 Job-ETAOPA .NET Core 3.1 204.0 ns 62.41 ns 3.42 ns 0.45 0.01 0.0062 - - 40 B

@JimBobSquarePants
Copy link
Member

Looking good @tkp1n !! I'll do a thorough review as soon as possible!

@antonfirsov
Copy link
Member

antonfirsov commented Nov 6, 2020

So guys, here's my "offer":

I want to put together a PR for #1410 that also alters the color converters to pack into a 3-channel RGB float buffer, but I want to "go lazy" and spare solving AVX2 shuffling riddles, because every hour I'd spend there would delay SixLabors/ImageSharp.Drawing#96 further.

So the question is: @tkp1n do you have any willingness and chance to spend a bit more time here, and finish such a refactor with @JimBobSquarePants 's help?

If we can make it, this trick would deliver superior performance compared to any alternative, since we could spare a completely unnecessary alpha padding.

Let me know what you think. If you agree we'd figure out the "what PR follows what" details. Cheers!

@JimBobSquarePants
Copy link
Member

@antonfirsov Isn't this essentially what #1242 should be doing?

@antonfirsov
Copy link
Member

That corresponds to #1121 ("Optimized pipeline"), which is a more expensive refactor, probably not realistic for 1.1.

The late night idea was that after all the "low hanging" refactors that happened, with a single extra step we could profit a lot now, and It's probably easier to do while this shuffling thing is fresh in your heads. (This would remove the need of implementing a one-step Vector4 -> Rgb24 converter, and bring more wins.)

However I'm not sure now, since what I'm asking might be tricky and time-consuming because the way 3-component RGB-s overlap Vector256<float>-s. If it's too much, let's stop now and continue with #1121 later.

@JimBobSquarePants
Copy link
Member

I don't think it's that much effort since the existing converters are all doing most of the work already. It's a case of moving most of it and filling out some gaps using Vector3 instead of Vector4 for the fallback packing and some offsetting for the HW Intrinsics version.

I'm happy to look at it after this is merged for simplicities sake.

@tkp1n tkp1n changed the title [WIP] Vectorize (AVX2) JPEG Color Converter Vectorize (AVX2) JPEG Color Converter Nov 6, 2020
@tkp1n tkp1n marked this pull request as ready for review November 6, 2020 14:38
@tkp1n
Copy link
Contributor Author

tkp1n commented Nov 6, 2020

Starting next Monday, I'll be away from "work" like this for 4 weeks with very limited access to the web. 😢
While I can likely address PR feedback, I won't be able to contribute to any larger-scale refactoring in this area.

I've removed the "[WIP]" and marked this PR as ready for review. Feel free to let me know what you need to be fixed before this can be merged.

If the timing of this PR doesn't suit you due to other planned or ongoing refactoring, feel free to set this one aside, and we'll reconsider it when the timing is right.

@antonfirsov
Copy link
Member

@tkp1n no worries, enjoy your vacation, and thanks for the contribution! Your PR should be the first to be merged, we'll review it ASAP.

@JimBobSquarePants
Copy link
Member

I can't see any issues with this. It all looks fantastic and the performance is consistent with master for the existing optimized conversion. 👍 🚀

If we drop FromGrayscaleVector8 I'd be very happy for this to go in.

@tkp1n
Copy link
Contributor Author

tkp1n commented Nov 6, 2020

I can't see any issues with this. It all looks fantastic and the performance is consistent with master for the existing optimized conversion. 👍 🚀

Thanks a lot 😃

If we drop FromGrayscaleVector8 I'd be very happy for this to go in.

Done, along with some final polishing of the benchmarks.

Copy link
Member

@antonfirsov antonfirsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One major concern about FromYCbCrVector, and a few simplification ideas.

Test coverage seems to prove that everything works 100% fine, however it might worth for @JimBobSquarePants to do a quick sanity check on the shuffling bits.

{
ConvertCore(values.Slice(0, simdCount), result.Slice(0, simdCount), this.MaximumValue, this.HalfValue);
}
protected override bool IsAvailable => true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
protected override bool IsAvailable => true;
protected override bool IsAvailable => Vector.IsHardwareAccelerated && Vector<float>.Count == 4;

This is essentially an SSE path based on Vector4. The converter is not safe to run if the above conditions are not met, because of the RoundAndDownscalePreVector8 call.

@tannergooding is there an environment variable to enforce a (Vector.IsHardwareAccelerated && Vector<float>.Count == 4) == true configuration for testing?

If not, I would suggest to remove this converter.

Copy link
Contributor

@tannergooding tannergooding Nov 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you set COMPlus_EnableAVX2=0 it will force Vector<T> to be 16-bytes. We don't have a switch (at least not one that ships in release builds) that explicitly controls the size of Vector<T> today.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


protected override void ConvertCoreVectorized(in ComponentValues values, Span<Vector4> result)
{
#if SUPPORTS_RUNTIME_INTRINSICS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we hide the whole file behind the condition instead, and add this guard to GetYCbCrConverters:

#if SUPPORTS_RUNTIME_INTRINSICS
yield return new FromYCbCrAvx2(precision);
#endif

Same for the other color spaces.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add the conditionals to the files we have to refactor the tests as I don't want to pepper the tests with conditionals and we don't have cross platform remote executor available to test everything on Core2.1 Win yet.

Adding to the yield though is fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok lets skip it for now. Comment could be useful.

Vector256<float> g = HwIntrinsics.MultiplyAdd(HwIntrinsics.MultiplyAdd(y, cb, gCbMult), cr, gCrMult);
Vector256<float> b = HwIntrinsics.MultiplyAdd(y, cb, bCbMult);

// TODO: We should be saving to RGBA not Vector4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JimBobSquarePants we should either do this or #1121, but let's not keep around contradicting plans in our notes!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote that a while back and forgot about it. I still believe there's merit in cutting out the FromVector4 call though. I need to make some diagrams and think things through.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still believe there's merit in cutting out the FromVector4

I think #1121 is the most efficient way to do that with the float pipelines.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I wasn't clear I don't mean change the pipeline what I mean is to pack triplets so we can avoid Vector4 => Rgba32 => Rgb24. That comment should have said RGB.

Copy link
Member

@antonfirsov antonfirsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private static void ValidateYCbCr(in JpegColorConverter.ComponentValues values, Vector4[] result, int i)
[Theory]
[MemberData(nameof(CommonConversionData))]
public void FromYCbCrVector(int inputBufferLength, int resultBufferLength, int seed)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't have to happen in this PR, but for 1.1 we need a variant of this test that works with:

FeatureTestRunner.RunWithHwIntrinsicsFeature(
                RunTest,
                args,
                HwIntrinsics.DisableAVX2);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I would like to do is a single test which I can then pass through the runner. That will test all the versions then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you parameterize it to also exercise the Vector8 the Vector4 and the scalar path?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, since the call change the environmental paths like COMPlus_EnableAVX2=0 we can do the following.

FeatureTestRunner.RunWithHwIntrinsicsFeature(
                RunTest,
                args,
                HwIntrinsics.AllowAll | HwIntrinsics.DisableAVX | HwIntrinsics.DisableSSE );

Copy link
Member

@antonfirsov antonfirsov Nov 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to get Vector.IsHardwareAccelerated == false? Is it enough to HwIntrinsics.DisableSSE?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

COMPlus_FeatureSIMD=0 should do it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep that's it. We have HwIntrinsics.DisableSIMD for that.

/// Gets a value indicating whether <see cref="Vector{T}"/> code is being JIT-ed to SSE instructions
/// where float and integer registers are of size 128 byte.
/// </summary>
public static bool HasVector4 { get; } =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this name misleading, one may think it has to do something with Vector4.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any suggestions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just inline it, there would be only one usage, if we'd address my other suggestion.

Copy link
Member

@antonfirsov antonfirsov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Will open an issue later for the test gap on FromYCbCrVector4.

@JimBobSquarePants JimBobSquarePants merged commit b37044f into SixLabors:master Nov 7, 2020
@tkp1n tkp1n deleted the tkp1n/avx2-color-converter branch November 7, 2020 21:37
@tkp1n
Copy link
Contributor Author

tkp1n commented Nov 7, 2020

Thank you @JimBobSquarePants and @antonfirsov for your inputs and hands-on time to get this merged so quickly 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants