Skip to content

Commit

Permalink
Update README.md (vllm-project#6847)
Browse files Browse the repository at this point in the history
  • Loading branch information
gurpreet-dhami authored Jul 27, 2024
1 parent 150a1ff commit b5f49ee
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion examples/fp8/quantizer/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
#### Run on H100 system for speed if FP8; number of GPUs depends on the model size

#### Example: quantize Llama2-7b model from HF to FP8 with FP8 KV Cache:
`python quantize.py --model_dir ./ll2-7b --dtype float16 --qformat fp8 --kv_cache_dtype fp8 --output_dir ./ll2_7b_fp8 --calib_size 512 --tp_size 1`
`python quantize.py --model-dir ./ll2-7b --dtype float16 --qformat fp8 --kv-cache-dtype fp8 --output-dir ./ll2_7b_fp8 --calib-size 512 --tp-size 1`

Outputs: model structure, quantized model & parameters (with scaling factors) are in JSON and Safetensors (npz is generated only for the reference)
```
Expand Down

0 comments on commit b5f49ee

Please sign in to comment.