Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TOPI][CUDA] Fix Winograd Kernel Size Support #4276

Merged
merged 2 commits into from
Nov 8, 2019

Conversation

comaniac
Copy link
Contributor

@comaniac comaniac commented Nov 7, 2019

The merged PR #4260 fixes the padding issue when building Winograd conv2d for CUDA, but we found the kernel size is still a problem.

The constraints of using Winograd on CUDA has been released in the previous PR #3553. Specifically, the original Winograd limits the kernel size (=3x3), padding (=1x1) and strides (=1x1). PR #3553 released them to square kernel size (e.g., 3x3, 5x5, 7x7), strides (=1x1), and arbitrary padding.

However, even PR #4260 fixes the miscalculation issue caused by padding size, the miscalculation caused by kernel size is still there. Here is a code snippet in conv2d_winograd.py after PR #4260:

    if not pre_computed: # kernel tensor is raw tensor, do strict check
        if dilation_h != 1 or dilation_w != 1:
            kernel = dilation(kernel, (1, 1, dilation_h, dilation_w))
        CO, CI, KH, KW = get_const_tuple(kernel.shape)
        assert HSTR == 1 and WSTR == 1 and KH == KW
    else:
        # kernel tensor is pre-transfomred. this op is created by alter op layout.
        # dilation is not supported
        _, _, CI, CO = get_const_tuple(kernel.shape)
        KH = KW = 3
        assert HSTR == 1 and WSTR == 1 and dilation_h == 1 and dilation_w == 1

As can be seen, kernel size is forced to 3x3 in the pre-computed case, but it could be any size according to PR #3553. This PR supports the kernel size recovery as follows:

alpha, _, CI, CO = get_const_tuple(kernel.shape)
KH = KW = alpha + 1 - tile_size

Here is another pending issue that I haven't resolved in this PR: the fixed errors in #4260 as well as this PR should be detected by the unit tests but they weren't. The reaons is that the unit test doesn't cover the part with pre_computed=True, but I have no idea how to cover it.

@cbalint13 @vinx13 could you review and suggest how to improve the unit test? Thanks.

@comaniac comaniac force-pushed the fix_winograd_cuda_kernel_size branch from 58921e0 to c87d509 Compare November 7, 2019 22:59
@vinx13
Copy link
Member

vinx13 commented Nov 7, 2019

To test with pre_computed case, you can add a relay unit test that runs a single conv layer model under opt_level=3. You may need to use winograd fallback context

@comaniac comaniac force-pushed the fix_winograd_cuda_kernel_size branch from 31cb655 to 15e096d Compare November 8, 2019 00:59
@comaniac comaniac force-pushed the fix_winograd_cuda_kernel_size branch from 15e096d to 361c361 Compare November 8, 2019 01:01
@comaniac
Copy link
Contributor Author

comaniac commented Nov 8, 2019

Thanks for the suggestion. The unit test has been added and I've confirmed the unit test covers the desire part.

One miner issue is that I have to adjust the error tolerance to 1e-3. Here are the max absolute differences by different shapes:

Kernel 3x3, Padding 1: ~1.6e-5
Kernel 3x3, Padding 0: ~5e-5
Kernel 7x7, Padding 2: ~1.6e-4

I assume this was caused by floating point error, because the reference numpy implementation is not based on Winograd algorithm but a normal conv2d. Please help confirm.

@vinx13 vinx13 merged commit 76b7967 into apache:master Nov 8, 2019
@cbalint13
Copy link
Contributor

Thanks for the suggestion. The unit test has been added and I've confirmed the unit test covers the desire part.

One miner issue is that I have to adjust the error tolerance to 1e-3. Here are the max absolute differences by different shapes:

Kernel 3x3, Padding 1: ~1.6e-5
Kernel 3x3, Padding 0: ~5e-5
Kernel 7x7, Padding 2: ~1.6e-4

I assume this was caused by floating point error, because the reference numpy implementation is not based on Winograd algorithm but a normal conv2d. Please help confirm.

@comaniac

  • It is normal to have increased error when tile size are beyond (3x3) in case of winograd.

@comaniac comaniac deleted the fix_winograd_cuda_kernel_size branch November 8, 2019 08:33
@comaniac
Copy link
Contributor Author

comaniac commented Nov 8, 2019

@comaniac

  • It is normal to have increased error when tile size are beyond (3x3) in case of winograd.

Thanks for the confirmation. Just want to make sure it's expected.

zxy844288792 pushed a commit to neo-ai/tvm that referenced this pull request Nov 13, 2019
* fix_winograd_cuda_kernel_size

* add unit test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants