-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8 bit Winograd Convolution? #16
Comments
There are a couple of ways to think about this. I will assume you are using 8-bit integers and not 8-bit floating point numbers. For deployment, the network weights are constant, so the winograd components can be computed offline in high precision, then quantized to 8-bits and stored. Because the Winograd components over-determine the raw weights, they actually contain more information than the raw weights. The downside is that the winograd components use more memory than the raw weights. F(2x2, 3x3) filter transforms expands the raw weights by a factor of 1.78X, F(4x4, 3x3) by 4X. Typically 8-bit activations are computed using full-precision multiplication and 32-bit accumulation, so that there is no precision loss during the computation. Then the 32-bit results are quantized to 8-bits before the next stage of computation. So you could also apply the winograd transform to 32-bit activations before quantizing to 8-bits. If you were to do this, you would probably fuse the multiplications stage, inverse winograd transform, bias, activation, forward winograd transform, and quantization stages into a single operation. The downside of this approach is that activations are stored in the winograd domain, which represents an expansion of the raw activations. The smaller the tile size, the bigger the expansion. F(2x2,3x3) expands raw activations by 4X, F(4x4,3x3) by 2.25X. Another possibility is to quantize the activations to even less than 8-bit precision, so that when you perform the Winograd transform, the result uses no more than 8-bits. This probably works well in some applications at least, as there are research results showing accurate classification using low-precision activations. Another possibility is to use 1-D Winograd transform, call it F(2x1, 3x3) or F(4x1, 3x3). This effectively turns a 2-D direct convolution into a 1-D direct convolution nested inside of a 1-D Winograd transform. The arithmetic complexity reduction is less, but so is the precision loss and activation and weight expansion. Also the computational intensity is higher, because the multiplications can be computed as matrix multiplications nested inside of a 1-D direct convolution. Additionally this might map to tensor core style arithmetic better than even 2-D direct convolution does. Also the 1-D Winograd transforms have even better data locality than the 2-D Winograd transforms, which are in turn better than the large tile FFT method. As an aside, I would like to point out that the effect of locality on the minimum workspace size is missing from recent analyses of fast algorithms for convnets, even though our original publication exploited Winograd locality to fit the entire working set in the GPU's small shared memory space. Obviously small-tile convolutions make possible instruction schedules that have fewer cache misses than large-tile (FFT) convolution algorithms do. I hope this gives you some ideas! |
@andravin @manojrohit |
Thank you for sharing the code @BUG1989. Any comments about accuracy degradation? |
@manojrohit |
Another thing to try with int8 winograd is to quantize each of the winograd components separately. This might be especially helpful when the input to the convolutional layer is the output of a ReLU activation. In that case, the input is nonnegative, so the winograd component with input transform You probably capture an extra bit of dynamic range if you map the |
Is it possible to implement Winograd Convolution with 8 bit weights and activations? The intermediate transformations cause overflows which results in the loss of accuracy of the overall CNN. Is anyone aware of research implementing Winograd in low precision domains?
The text was updated successfully, but these errors were encountered: