From 2b0879bfc273a08d339b952890c4e88e77f0a014 Mon Sep 17 00:00:00 2001 From: fzyzcjy <5236035+fzyzcjy@users.noreply.github.com> Date: Mon, 25 Nov 2024 21:08:30 +0800 Subject: [PATCH] Super tiny little typo fix (#10633) --- docs/source/quantization/fp8_e5m2_kvcache.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/quantization/fp8_e5m2_kvcache.rst b/docs/source/quantization/fp8_e5m2_kvcache.rst index 9ae07bcd3b991..b2d824427f786 100644 --- a/docs/source/quantization/fp8_e5m2_kvcache.rst +++ b/docs/source/quantization/fp8_e5m2_kvcache.rst @@ -4,7 +4,7 @@ FP8 E5M2 KV Cache ================== The int8/int4 quantization scheme requires additional scale GPU memory storage, which reduces the expected GPU memory benefits. -The FP8 data format retains 2~3 mantissa bits and can convert float/fp16/bflaot16 and fp8 to each other. +The FP8 data format retains 2~3 mantissa bits and can convert float/fp16/bfloat16 and fp8 to each other. Here is an example of how to enable this feature: