Question about batch_size in paper #22

blgimagineb · 2024-12-04T06:43:53Z

mentioned in paper,"given the same batch size, GEAR significantly reduces the peak memory compared to FP16 baseline, increasing the maximum severing number (i.e., batch size) from 3 to 18"

Why does the peak memory of GEAR decrease at a slope that is 1/4 to 1/3 of FP16 as the batch size increases? Is it due to the quantization of activations?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about batch_size in paper #22

Question about batch_size in paper #22

blgimagineb commented Dec 4, 2024

Question about batch_size in paper #22

Question about batch_size in paper #22

Comments

blgimagineb commented Dec 4, 2024