Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InvalidHeaderDeserialization #6

Open
ticlazau opened this issue Nov 20, 2023 · 2 comments
Open

InvalidHeaderDeserialization #6

ticlazau opened this issue Nov 20, 2023 · 2 comments

Comments

@ticlazau
Copy link

ticlazau commented Nov 20, 2023

Hello,

the fine-tuning process was done successfully, however when I try to run separate the inference by loading the code"

import torch
from transformers import AutoModelForCausalLM, BitsAndBytesConfig, AutoTokenizer

base_model = "codellama/CodeLlama-7b-hf"
model = AutoModelForCausalLM.from_pretrained(
    base_model,
    load_in_8bit=True,
    torch_dtype=torch.float16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("codellama/CodeLlama-7b-hf")

from peft import PeftModel
model = PeftModel.from_pretrained(model, "/home/fm/codellama/sql-code-llama/checkpoint-400")

eval_prompt = """You are a powerful text-to-SQL model. Your job is to answer questions about a database. You are given a question and context regarding one or more tables.

You must output the SQL query that answers the question.
### Input:
Which Class has a Frequency MHz larger than 91.5, and a City of license of hyannis, nebraska?

### Context:
CREATE TABLE table_name_12 (class VARCHAR, frequency_mhz VARCHAR, city_of_license VARCHAR)

### Response:
"""

model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")

model.eval()
with torch.no_grad():
    print(tokenizer.decode(model.generate(**model_input, max_new_tokens=100)[0], skip_special_tokens=True))

I am getting this error:

loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00, 1.19s/it]
Traceback (most recent call last):
File "/home/fm/codellama/evaluate.py", line 14, in
model = PeftModel.from_pretrained(model, "/home/florin.manaila/codellama/sql-code-llama/checkpoint-400")
File "/home/fm/anaconda3/envs/genai/lib/python3.10/site-packages/peft/peft_model.py", line 332, in from_pretrained
model.load_adapter(model_id, adapter_name, is_trainable=is_trainable, **kwargs)
File "/home/fm/anaconda3/envs/genai/lib/python3.10/site-packages/peft/peft_model.py", line 629, in load_adapter
adapters_weights = load_peft_weights(model_id, device=torch_device, **hf_hub_download_kwargs)
File "/home/fm/anaconda3/envs/genai/lib/python3.10/site-packages/peft/utils/save_and_load.py", line 222, in load_peft_weights
adapters_weights = safe_load_file(filename, device=device)
File "/home/fm/anaconda3/envs/genai/lib/python3.10/site-packages/safetensors/torch.py", line 308, in load_file
with safe_open(filename, framework="pt", device=device) as f:
safetensors_rust.SafetensorError: Error while deserializing header: InvalidHeaderDeserialization

Is something I am doing wrong?

Thank you,
Florin

@zhuol
Copy link

zhuol commented Nov 27, 2023

Your safetensors file should be broken.
Just check the size of it and if it is really small then that is the issue.

My suggestion is to just comment out the following code to avoid torch compile, it may speed something up but it actually messed up with checkpoint generation from my experience.

model.config.use_cache = False

old_state_dict = model.state_dict
model.state_dict = (lambda self, *_, **__: get_peft_model_state_dict(self, old_state_dict())).__get__(
    model, type(model)
)
if torch.__version__ >= "2" and sys.platform != "win32":
    print("compiling the model")
    model = torch.compile(model)

After then, just rerun the training. It will converge in 40-60 steps so there is no need to run it for 400 or ish. Once you are done with the new training, I think you will be good to go to try the adapter model loading leveraging peft.

@strobel1x
Copy link

For me the problem got solved once I added a model.save_pretrained(output_dir) after trainer.train() finished:

trainer.train()
model.save_pretrained(output_dir)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants