You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Are you using an environment where you have installed the transformers library from either the "gradients" or "deployment" folder? I believe this issue is caused by a mismatch between the quantization code and the modified transformers library for gradient computation / deployment. At the moment, the quantization code isn't compatible with these environments, so to run simulated quantization you need to install transformers using pip into the kvquant conda environment.
when I try
CUDA_VISIBLE_DEVICES=0 python llama_simquant.py --abits 4 --nsamples 16 --seqlen 2048 --nuq --fisher --quantize --include_sparse --sparsity-threshold 0.99 --quantizer_path quantizers.pickle ;
get this error
AttributeError: 'LlamaModel' object has no attribute 'split_gpus'
what is the problem
The text was updated successfully, but these errors were encountered: