Skip to content

Commit

Permalink
Update fp8_e5m2_kvcache.rst
Browse files Browse the repository at this point in the history
  • Loading branch information
fzyzcjy authored Nov 25, 2024
1 parent ed46f14 commit f5be655
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/source/quantization/fp8_e5m2_kvcache.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ FP8 E5M2 KV Cache
==================

The int8/int4 quantization scheme requires additional scale GPU memory storage, which reduces the expected GPU memory benefits.
The FP8 data format retains 2~3 mantissa bits and can convert float/fp16/bflaot16 and fp8 to each other.
The FP8 data format retains 2~3 mantissa bits and can convert float/fp16/bfloat16 and fp8 to each other.

Here is an example of how to enable this feature:

Expand Down

0 comments on commit f5be655

Please sign in to comment.