Question about implementation of gradient compression strategies #18447
-
Hi guys, I am currently studying gradient compression for distributed communication reduction. I found that some strategies have been implemented in MXNet, such as Signum and 2bit quantization. However, such methods are implemented at different levels. Signum is an optimizer while 2bit quantization only works in kvstore. So I am confused what is a best way to implement these gradient compression strategies? How do we determine where (python frontend or kvstore or others) to implement these? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
@mxnet-label-bot add [Question] |
Beta Was this translation helpful? Give feedback.
-
Recently, we provide a gradient compression api based on BytePS. See bytedance/byteps#225 and https://github.com/bytedance/byteps/blob/master/docs/gradient-compression.md. Currently, it supports 1-bit, top-k, random-k, and dithering. The example of training on ImageNet with mxnet is provided in https://github.com/bytedance/byteps/blob/master/example/mxnet/train_gluon_imagenet_byteps_gc.py |
Beta Was this translation helpful? Give feedback.
Recently, we provide a gradient compression api based on BytePS. See bytedance/byteps#225 and https://github.com/bytedance/byteps/blob/master/docs/gradient-compression.md. Currently, it supports 1-bit, top-k, random-k, and dithering. The example of training on ImageNet with mxnet is provided in https://github.com/bytedance/byteps/blob/master/example/mxnet/train_gluon_imagenet_byteps_gc.py