Add bfp16 support to inhibit transforming to fp32 when using llama3 #31

StarConnor · 2024-07-14T13:01:03Z

For Llama3, it is trained on bfp16, so we should not transform bfp16 to fp32 during inference. So I change the two precision transformations in transformer_lens. Basically, I add torch.bfloat16 in the precision check list to avoid transformation to fp32.

Attention
In abstract_attention.py.
Layer Norm
In rms_norm.py, and rms_norm.py (because Llama3 use RMSnorm)
Note: I did not change the precision check list in layernorm_pre.py and layernorm.py because Llama3 doesn't use them. Should I also modified these two files?

dest1n1s · 2024-07-15T04:47:08Z

From my perspective, changing other precision checks could be postponed to avoid other issues.

Add bfp16 support to inhibit transforming to fp32 when using llama3

90c4a9c

StarConnor requested a review from dest1n1s July 14, 2024 13:01

dest1n1s approved these changes Jul 15, 2024

View reviewed changes

dest1n1s merged commit 4a14ae0 into main Jul 15, 2024
1 check passed

dest1n1s deleted the tl_dtype branch July 15, 2024 04:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add bfp16 support to inhibit transforming to fp32 when using llama3 #31

Add bfp16 support to inhibit transforming to fp32 when using llama3 #31

StarConnor commented Jul 14, 2024

dest1n1s commented Jul 15, 2024

Add bfp16 support to inhibit transforming to fp32 when using llama3 #31

Add bfp16 support to inhibit transforming to fp32 when using llama3 #31

Conversation

StarConnor commented Jul 14, 2024

dest1n1s commented Jul 15, 2024