-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to inference after finetuning ? #48
Comments
May I ask, what version of transformers do you use? |
Hi, please consult this doc regarding generation. |
@kriskrisliu Are you able to generate text normally? I got something like "bashselectionièmeRemove₃ahractory Normdateiering declaringйской autom ін annual (+ групў " |
For my case it is ok to generate good results which make sense and are better than meta's official model. |
@kriskrisliu Thanks for the reply! I was fine-tuning using the dataset they provided. And this are the output I got: Instruction:{How are you doing} {{\ cincoapk Sorry Past estimationerca foreverтелейcred suo surely Kriegsœ Toulclihmгуnamespaceifecycle generateRES Insteadorter fosse lect exclus Bowl IX Нью proves monarchBU Liguedefn pedigclean Pok Japon Okayун primit "- educational sorte movie folders communicate Quando italiana Ej Турнва Étatsonn3a.... When I was training the model, it throws an error after finishing the last epoch: packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:2224: UserWarning: Failed to clone() tensor with name _fsdp_wrapped_module._fpw_module.model.layers.29.mlp.up_proj.weight. This may mean that this state_dict entry could point to invalid memory regions after returning from state_dict() call if this parameter is managed by FSDP. Please check clone implementation of _fsdp_wrapped_module._fpw_module.model.layers.29.mlp.up_proj.weight. Error: CUDA out of memory. Tried to allocate 172.00 MiB (GPU 0; 39.59 GiB total capacity; 36.73 GiB already allocated; 148.19 MiB free; 37.48 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF But it looks like the model is still saved after this error. |
And I was using this command: |
I didn't meet these errors.
compare with the non-finetuned: (you can see that the model sometimes repeats the question and give low-quality response)
|
@kriskrisliu Thanks for sharing the result. Your results look very good! I was using the following code to perform inference. Is it similar to yours? Load modelprint("Loading model...") Use FP16model = model.half() Move to GPUprint("Moving model to gpu...") tokenized_text = tokenizer("Instruction: List all Canadian provinces in alphabetical order.", return_tensors="pt", padding="longest", max_length=tokenizer.model_max_length, truncation=True) Generate resultfull_completion = model.generate(inputs=tokenized_text["input_ids"].to("cuda"), decoded_text = tokenizer.decode(full_completion[0]) |
@kriskrisliu Are you using the main branch of HuggingFace Transformers for fine-tuning (Llama)? |
@kriskrisliu It would be really nice if you can share you generation script. |
Load tokenizer like train.py tokenizer = transformers.AutoTokenizer.from_pretrained(
origin_tokenizer,
padding_side="right",
use_fast=False,
)
if tokenizer.pad_token is None:
smart_tokenizer_and_embedding_resize(
special_tokens_dict=dict(pad_token=DEFAULT_PAD_TOKEN),
tokenizer=tokenizer,
model=model,
)
tokenizer.add_special_tokens(
{
"eos_token": DEFAULT_EOS_TOKEN,
"bos_token": DEFAULT_BOS_TOKEN,
"unk_token": DEFAULT_UNK_TOKEN,
}
) |
@kriskrisliu I think many folks here meet the same issue for inference. It would be great if you could help share the inference code. Thanks! |
Could you share the ckpt you trained? Or has anyone find an existing ckpt for generation? |
@kriskrisliu could you share the code of inference? |
@wade3han
I just use
How to fix it? |
I have the same problem, but I don't know why he showed up. This problem is solved when I comment out line 3. |
I encountered same issue when I used different instructions. Actually the lines _alpaca/blob/64100b4139545818b578cde7410b0ea66a62de9e/inference.py#L93-L103 are not necessary for inference (just for debugging), you can just comment out them. |
Thanks for sharing the training code. I've finished a 3-epoch finetuing.
However, I don't find the inference code.
Would you please give some advice on it? or sharing the infercence code ?
Thanks again!
The text was updated successfully, but these errors were encountered: