[TOPI][CUDA] Fix Winograd Kernel Size Support #4276

comaniac · 2019-11-07T22:57:44Z

The merged PR #4260 fixes the padding issue when building Winograd conv2d for CUDA, but we found the kernel size is still a problem.

The constraints of using Winograd on CUDA has been released in the previous PR #3553. Specifically, the original Winograd limits the kernel size (=3x3), padding (=1x1) and strides (=1x1). PR #3553 released them to square kernel size (e.g., 3x3, 5x5, 7x7), strides (=1x1), and arbitrary padding.

However, even PR #4260 fixes the miscalculation issue caused by padding size, the miscalculation caused by kernel size is still there. Here is a code snippet in conv2d_winograd.py after PR #4260:

    if not pre_computed: # kernel tensor is raw tensor, do strict check
        if dilation_h != 1 or dilation_w != 1:
            kernel = dilation(kernel, (1, 1, dilation_h, dilation_w))
        CO, CI, KH, KW = get_const_tuple(kernel.shape)
        assert HSTR == 1 and WSTR == 1 and KH == KW
    else:
        # kernel tensor is pre-transfomred. this op is created by alter op layout.
        # dilation is not supported
        _, _, CI, CO = get_const_tuple(kernel.shape)
        KH = KW = 3
        assert HSTR == 1 and WSTR == 1 and dilation_h == 1 and dilation_w == 1

As can be seen, kernel size is forced to 3x3 in the pre-computed case, but it could be any size according to PR #3553. This PR supports the kernel size recovery as follows:

alpha, _, CI, CO = get_const_tuple(kernel.shape)
KH = KW = alpha + 1 - tile_size

Here is another pending issue that I haven't resolved in this PR: the fixed errors in #4260 as well as this PR should be detected by the unit tests but they weren't. The reaons is that the unit test doesn't cover the part with pre_computed=True, but I have no idea how to cover it.

@cbalint13 @vinx13 could you review and suggest how to improve the unit test? Thanks.

vinx13 · 2019-11-07T23:02:48Z

To test with pre_computed case, you can add a relay unit test that runs a single conv layer model under opt_level=3. You may need to use winograd fallback context

comaniac · 2019-11-08T01:06:48Z

Thanks for the suggestion. The unit test has been added and I've confirmed the unit test covers the desire part.

One miner issue is that I have to adjust the error tolerance to 1e-3. Here are the max absolute differences by different shapes:

Kernel 3x3, Padding 1: ~1.6e-5
Kernel 3x3, Padding 0: ~5e-5
Kernel 7x7, Padding 2: ~1.6e-4

I assume this was caused by floating point error, because the reference numpy implementation is not based on Winograd algorithm but a normal conv2d. Please help confirm.

cbalint13 · 2019-11-08T08:27:33Z

Thanks for the suggestion. The unit test has been added and I've confirmed the unit test covers the desire part.

One miner issue is that I have to adjust the error tolerance to 1e-3. Here are the max absolute differences by different shapes:

Kernel 3x3, Padding 1: ~1.6e-5
Kernel 3x3, Padding 0: ~5e-5
Kernel 7x7, Padding 2: ~1.6e-4

I assume this was caused by floating point error, because the reference numpy implementation is not based on Winograd algorithm but a normal conv2d. Please help confirm.

@comaniac

It is normal to have increased error when tile size are beyond (3x3) in case of winograd.

comaniac · 2019-11-08T08:35:09Z

@comaniac

It is normal to have increased error when tile size are beyond (3x3) in case of winograd.

Thanks for the confirmation. Just want to make sure it's expected.

* fix_winograd_cuda_kernel_size * add unit test

fix_winograd_cuda_kernel_size

c87d509

comaniac force-pushed the fix_winograd_cuda_kernel_size branch from 58921e0 to c87d509 Compare November 7, 2019 22:59

comaniac force-pushed the fix_winograd_cuda_kernel_size branch from 31cb655 to 15e096d Compare November 8, 2019 00:59

add unit test

361c361

comaniac force-pushed the fix_winograd_cuda_kernel_size branch from 15e096d to 361c361 Compare November 8, 2019 01:01

vinx13 approved these changes Nov 8, 2019

View reviewed changes

vinx13 merged commit 76b7967 into apache:master Nov 8, 2019

vinx13 added the status: accepted label Nov 8, 2019

comaniac deleted the fix_winograd_cuda_kernel_size branch November 8, 2019 08:33

zxy844288792 pushed a commit to neo-ai/tvm that referenced this pull request Nov 13, 2019

[TOPI][CUDA] Fix Winograd Kernel Size Support (apache#4276)

11f6ae1

* fix_winograd_cuda_kernel_size * add unit test

yzhliu mentioned this pull request Nov 16, 2019

[RELEASE][DRAFT] TVM v0.6 Release candidate #4259

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TOPI][CUDA] Fix Winograd Kernel Size Support #4276

[TOPI][CUDA] Fix Winograd Kernel Size Support #4276

comaniac commented Nov 7, 2019

vinx13 commented Nov 7, 2019

comaniac commented Nov 8, 2019

cbalint13 commented Nov 8, 2019

comaniac commented Nov 8, 2019

[TOPI][CUDA] Fix Winograd Kernel Size Support #4276

[TOPI][CUDA] Fix Winograd Kernel Size Support #4276

Conversation

comaniac commented Nov 7, 2019

vinx13 commented Nov 7, 2019

comaniac commented Nov 8, 2019

cbalint13 commented Nov 8, 2019

comaniac commented Nov 8, 2019