Skip to content

Latest commit

 

History

History
13 lines (8 loc) · 596 Bytes

File metadata and controls

13 lines (8 loc) · 596 Bytes

AutoGPTQ from https://github.com/PanQiWei/AutoGPTQ (installed under /opt/AutoGPTQ)

Inference Benchmark

Substitute the GPTQ model from HuggingFace Hub (or model path) that you want to run:

./run.sh --workdir=/opt/AutoGPTQ/examples/benchmark/ $(./autotag auto_gptq) \
   python3 generation_speed.py --model_name_or_path TheBloke/LLaMA-7b-GPTQ --use_safetensors --max_new_tokens=128

If you get the error Exllama kernel does not support query/key/value fusion with act-order, try adding --no_inject_fused_attention