Adding torchao apis to gpt-fast #208

HDCharles · 2024-10-17T07:49:14Z

Summary:

adding torchao apis to gpt-fast and some minor tweaks

Test Plan:

(in progress)
export MODEL_REPO=meta-llama/Meta-Llama-3-8B

python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode torchao-int8 python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int8.pth --compile python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int8.pth python eval.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int8.pth --tasks wikitext --compile

For model checkpoints/meta-llama/Meta-Llama-3-8B/model_torchao-int8.pth
wikitext: {'word_perplexity,none': 7.900496793735154, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 1.4718578218273202, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 0.5576383170121927, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'}

python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode torchao-int4-hqq python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int4-hqq.pth --compile python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int4-hqq.pth python eval.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int4-hqq.pth --tasks wikitext --compile

For model checkpoints/meta-llama/Meta-Llama-3-8B/model_torchao-int4-hqq.pth
wikitext: {'word_perplexity,none': 8.44187872159186, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 1.4902143610748824, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 0.575519871235033, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'}

python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode torchao-int4 python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int4.pth --compile python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int4.pth python eval.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int4.pth --tasks wikitext --compile

For model checkpoints/meta-llama/Meta-Llama-3-8B/model_torchao-int4.pth
wikitext: {'word_perplexity,none': 8.59031159441983, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 1.4950796712267396, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 0.5802223661766339, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'}

Reviewers:

Subscribers:

Tasks:

Tags:

Summary: adding torchao apis to gpt-fast and some minor tweaks Test Plan: (in progress) export MODEL_REPO=meta-llama/Meta-Llama-3-8B python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode torchao-int8 python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int8.pth --compile python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int8.pth python eval.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int8.pth --tasks wikitext --compile python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode torchao-int4-hqq python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int4-hqq.pth --compile python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int4-hqq.pth python eval.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int4-hqq.pth --tasks wikitext --compile python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode torchao-int4 python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int4.pth --compile python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int4.pth python eval.py --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int4.pth --tasks wikitext --compile python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode int8 python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model_int8.pth --compile python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model_int8.pth python eval.py --checkpoint_path checkpoints/$MODEL_REPO/model_int8.pth --tasks wikitext --compile python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode int4 python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model_int4.g32.pth --compile python generate.py --checkpoint_path checkpoints/$MODEL_REPO/model_int4.g32.pth python eval.py --checkpoint_path checkpoints/$MODEL_REPO/model_int4.g32.pth --tasks wikitext --compile Reviewers: Subscribers: Tasks: Tags:

HDCharles · 2024-10-17T14:01:12Z

@Chillee should i add info to the README or implement this differently?

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

gchlebus · 2024-11-08T14:31:45Z

Is the plan for this PR also to add fp8 support which is available in torchao?

jerryzh168 · 2024-12-20T04:29:14Z

it seems that this one does not work with tp yet:

ENABLE_INTRA_NODE_COMM=1 torchrun --standalone --nproc_per_node=2 generate.py --compile --checkpoint_path checkpoints/$MODEL_REPO/model_torchao-int4.pth

[rank1]: NotImplementedError: AffineQuantizedTensor dispatch: attempting to run unimplemented operator/function: func=<OpOverload(op='aten.split_with_sizes', overload='default')>, types=(<class 'torchao.dtypes.affine_quantized_tensor.AffineQuantizedTensor'>,), arg_types=(<class 'torchao.dtypes.affine_quantized_tensor.AffineQuantizedTensor'>, <class 'list'>), kwarg_types={}

will need to implement this op in AQT to support this or change the tp implementation to DTensor I guess.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 17, 2024

HDCharles requested a review from Chillee October 17, 2024 14:00

Adding info to readme

7144ffb

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

jerryzh168 mentioned this pull request Dec 20, 2024

int4 quant broken right now? #217

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding torchao apis to gpt-fast #208

Adding torchao apis to gpt-fast #208

HDCharles commented Oct 17, 2024 •

edited

Loading

HDCharles commented Oct 17, 2024

gchlebus commented Nov 8, 2024

jerryzh168 commented Dec 20, 2024

Adding torchao apis to gpt-fast #208

Are you sure you want to change the base?

Adding torchao apis to gpt-fast #208

Conversation

HDCharles commented Oct 17, 2024 • edited Loading

HDCharles commented Oct 17, 2024

gchlebus commented Nov 8, 2024

jerryzh168 commented Dec 20, 2024

HDCharles commented Oct 17, 2024 •

edited

Loading