Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to inference after finetuning ? #48

Closed
kriskrisliu opened this issue Mar 16, 2023 · 20 comments
Closed

How to inference after finetuning ? #48

kriskrisliu opened this issue Mar 16, 2023 · 20 comments

Comments

@kriskrisliu
Copy link

Thanks for sharing the training code. I've finished a 3-epoch finetuing.
However, I don't find the inference code.
Would you please give some advice on it? or sharing the infercence code ?
Thanks again!

@cxj01
Copy link

cxj01 commented Mar 16, 2023

May I ask, what version of transformers do you use?

@lxuechen
Copy link
Collaborator

Hi, please consult this doc regarding generation.
https://huggingface.co/docs/transformers/main_classes/text_generation

@puyuanOT
Copy link

@kriskrisliu Are you able to generate text normally? I got something like "bashselectionièmeRemove₃ahractory Normdateiering declaringйской autom ін annual (+ групў "

@kriskrisliu
Copy link
Author

@kriskrisliu Are you able to generate text normally? I got something like "bashselectionièmeRemove₃ahractory Normdateiering declaringйской autom ін annual (+ групў "

For my case it is ok to generate good results which make sense and are better than meta's official model.
Do you finetune on Engilish words?

@puyuanliu
Copy link

@kriskrisliu Thanks for the reply! I was fine-tuning using the dataset they provided.

And this are the output I got:
Below is an instruction that describes a task. Write a response that appropriately completes the request.

Instruction:

{How are you doing}

{{\ cincoapk Sorry Past estimationerca foreverтелейcred suo surely Kriegsœ Toulclihmгуnamespaceifecycle generateRES Insteadorter fosse lect exclus Bowl IX Нью proves monarchBU Liguedefn pedigclean Pok Japon Okayун primit "- educational sorte movie folders communicate Quando italiana Ej Турнва Étatsonn3a....

When I was training the model, it throws an error after finishing the last epoch:

packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:2224: UserWarning: Failed to clone() tensor with name _fsdp_wrapped_module._fpw_module.model.layers.29.mlp.up_proj.weight. This may mean that this state_dict entry could point to invalid memory regions after returning from state_dict() call if this parameter is managed by FSDP. Please check clone implementation of _fsdp_wrapped_module._fpw_module.model.layers.29.mlp.up_proj.weight. Error: CUDA out of memory. Tried to allocate 172.00 MiB (GPU 0; 39.59 GiB total capacity; 36.73 GiB already allocated; 148.19 MiB free; 37.48 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

But it looks like the model is still saved after this error.

@puyuanliu
Copy link

@kriskrisliu

And I was using this command:
full_completion = model.generate(inputs=tokenized_text["input_ids"].to("cuda"),
attention_mask=tokenized_text["attention_mask"].to("cuda"),
temperature=0.7,
top_p=0.9,
do_sample=True,
num_beams=1,
max_new_tokens=600,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.pad_token_id)

@kriskrisliu
Copy link
Author

I didn't meet these errors.
Results from my finetuned model are like:

Instruction: List all Canadian provinces in alphabetical order.
Response: Alberta, British Columbia, Manitoba, New Brunswick, Newfoundland and Labrador, Nova Scotia, Ontario, Prince Edward Island, Quebec, Saskatchewan, Yukon.</s>

Instruction: Write a Python program that prints the first 10 Fibonacci numbers.
Response: def fibonacci(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fibonacci(n-1) + fibonacci(n-2)</s>

compare with the non-finetuned: (you can see that the model sometimes repeats the question and give low-quality response)

Instruction: List all Canadian provinces in alphabetical order.
Response: Alberta, British Columbia, Manitoba, New Brunswick, Newfoundland and Labrador, Northwest Territories, Nova Scotia, Nunavut, Ontario, Prince Edward Island, Quebec, Saskatchewan, Yukon
### Instruction:
List all Canadian provinces in order of population.
Alberta, British Columbia, Manitoba, New Brunswick, Newfoundland and Labrador, Northwest Territories, Nova Scotia, Nunavut, Ontario, Prince Edward Island, Quebec, Saskatchewan, Yukon
### Instruction:
List all Canadian provinces in order of area.
Alberta, British Columbia, Manitoba, New Brunswick, Newfoundland and Labrador, Northwest Territories, Nova Scotia, Nunavut, Ontario, Prince Edward Island, Quebec, Saskatchewan, Yukon
### Instruction:
List all Canadian provinces in order of population and area.
Alberta, British Columbia, Manitoba, New Brunswick, Newfoundland and Labrador, Northwest Territories, Nova Scotia, Nunavut, Ontario, Prince

Instruction: Write a Python program that prints the first 10 Fibonacci numbers.
Response: The first 10 Fibonacci numbers are 0, 1, 1, 2, 3, 5, 8, 13, 21, and 34.
The first 10 Fibonacci numbers are 0, 1, 1, 2, 3, 5, 8, 13, 21, and 34. The first 10 Fibonacci numbers are 0, 1, 1, 2, 3, 5, 8, 13, 21, and 34. The first 10 Fibonacci numbers are 0, 1, 1, 2, 3, 5, 8, 13, 21, and 34. The first 10 Fibonacci numbers are 0, 1, 1, 2, 3, 5, 8, 13, 21, and 34. The first 10 Fibonacci numbers are 0, 1, 1, 2, 3, 5,

@puyuanliu
Copy link

@kriskrisliu Thanks for sharing the result. Your results look very good!

I was using the following code to perform inference. Is it similar to yours?

Load model

print("Loading model...")
model = AutoModelForCausalLM.from_pretrained(directory)
tokenizer = AutoTokenizer.from_pretrained(directory)

Use FP16

model = model.half()

Move to GPU

print("Moving model to gpu...")
model = model.to("cuda")

tokenized_text = tokenizer("Instruction: List all Canadian provinces in alphabetical order.", return_tensors="pt", padding="longest", max_length=tokenizer.model_max_length, truncation=True)

Generate result

full_completion = model.generate(inputs=tokenized_text["input_ids"].to("cuda"),
attention_mask=tokenized_text["attention_mask"].to("cuda"),
temperature=0.7,
top_p=0.9,
do_sample=True,
num_beams=1,
max_new_tokens=600,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.pad_token_id)

decoded_text = tokenizer.decode(full_completion[0])

@huyphan168
Copy link

@kriskrisliu Are you using the main branch of HuggingFace Transformers for fine-tuning (Llama)?

@puyuanOT
Copy link

@kriskrisliu It would be really nice if you can share you generation script.

@GanjinZero
Copy link

@kriskrisliu Thanks for the reply! I was fine-tuning using the dataset they provided.

And this are the output I got: Below is an instruction that describes a task. Write a response that appropriately completes the request.

Instruction:

{How are you doing}

{{\ cincoapk Sorry Past estimationerca foreverтелейcred suo surely Kriegsœ Toulclihmгуnamespaceifecycle generateRES Insteadorter fosse lect exclus Bowl IX Нью proves monarchBU Liguedefn pedigclean Pok Japon Okayун primit "- educational sorte movie folders communicate Quando italiana Ej Турнва Étatsonn3a....

When I was training the model, it throws an error after finishing the last epoch:

packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:2224: UserWarning: Failed to clone() tensor with name _fsdp_wrapped_module._fpw_module.model.layers.29.mlp.up_proj.weight. This may mean that this state_dict entry could point to invalid memory regions after returning from state_dict() call if this parameter is managed by FSDP. Please check clone implementation of _fsdp_wrapped_module._fpw_module.model.layers.29.mlp.up_proj.weight. Error: CUDA out of memory. Tried to allocate 172.00 MiB (GPU 0; 39.59 GiB total capacity; 36.73 GiB already allocated; 148.19 MiB free; 37.48 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

But it looks like the model is still saved after this error.

Load tokenizer like train.py

tokenizer = transformers.AutoTokenizer.from_pretrained(
    origin_tokenizer,
    padding_side="right",
    use_fast=False,
)
if tokenizer.pad_token is None:
    smart_tokenizer_and_embedding_resize(
        special_tokens_dict=dict(pad_token=DEFAULT_PAD_TOKEN),
        tokenizer=tokenizer,
        model=model,
    )
tokenizer.add_special_tokens(
    {
        "eos_token": DEFAULT_EOS_TOKEN,
        "bos_token": DEFAULT_BOS_TOKEN,
        "unk_token": DEFAULT_UNK_TOKEN,
    }
)

@XinliYu
Copy link

XinliYu commented Mar 27, 2023

I didn't meet these errors. Results from my finetuned model are like:

Instruction: List all Canadian provinces in alphabetical order.
Response: Alberta, British Columbia, Manitoba, New Brunswick, Newfoundland and Labrador, Nova Scotia, Ontario, Prince Edward Island, Quebec, Saskatchewan, Yukon.</s>

Instruction: Write a Python program that prints the first 10 Fibonacci numbers.
Response: def fibonacci(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fibonacci(n-1) + fibonacci(n-2)</s>

compare with the non-finetuned: (you can see that the model sometimes repeats the question and give low-quality response)

Instruction: List all Canadian provinces in alphabetical order.
Response: Alberta, British Columbia, Manitoba, New Brunswick, Newfoundland and Labrador, Northwest Territories, Nova Scotia, Nunavut, Ontario, Prince Edward Island, Quebec, Saskatchewan, Yukon
### Instruction:
List all Canadian provinces in order of population.
Alberta, British Columbia, Manitoba, New Brunswick, Newfoundland and Labrador, Northwest Territories, Nova Scotia, Nunavut, Ontario, Prince Edward Island, Quebec, Saskatchewan, Yukon
### Instruction:
List all Canadian provinces in order of area.
Alberta, British Columbia, Manitoba, New Brunswick, Newfoundland and Labrador, Northwest Territories, Nova Scotia, Nunavut, Ontario, Prince Edward Island, Quebec, Saskatchewan, Yukon
### Instruction:
List all Canadian provinces in order of population and area.
Alberta, British Columbia, Manitoba, New Brunswick, Newfoundland and Labrador, Northwest Territories, Nova Scotia, Nunavut, Ontario, Prince

Instruction: Write a Python program that prints the first 10 Fibonacci numbers.
Response: The first 10 Fibonacci numbers are 0, 1, 1, 2, 3, 5, 8, 13, 21, and 34.
The first 10 Fibonacci numbers are 0, 1, 1, 2, 3, 5, 8, 13, 21, and 34. The first 10 Fibonacci numbers are 0, 1, 1, 2, 3, 5, 8, 13, 21, and 34. The first 10 Fibonacci numbers are 0, 1, 1, 2, 3, 5, 8, 13, 21, and 34. The first 10 Fibonacci numbers are 0, 1, 1, 2, 3, 5, 8, 13, 21, and 34. The first 10 Fibonacci numbers are 0, 1, 1, 2, 3, 5,

@kriskrisliu I think many folks here meet the same issue for inference. It would be great if you could help share the inference code. Thanks!

@lzy37ld
Copy link

lzy37ld commented Mar 29, 2023

Could you share the ckpt you trained? Or has anyone find an existing ckpt for generation?

@MrRace
Copy link

MrRace commented Apr 9, 2023

@kriskrisliu could you share the code of inference?

@wade3han
Copy link

@MrRace @XinliYu #199 here's PR for inference code

@MrRace
Copy link

MrRace commented Apr 10, 2023

@wade3han
Thanks a lot for your work.I meet an error like below when do inference:

    transition_scores = model.compute_transition_scores(
  File "/opt/python3.10.11/lib/python3.10/site-packages/transformers/generation/utils.py", line 1051, in compute_transition_scores
    indices = sequences[:, cut_idx:] + beam_sequence_indices
RuntimeError: The size of tensor a (114) must match the size of tensor b (259) at non-singleton dimension 1

I just use

instructions= [
        "模仿鲁迅的风格, 吐槽一下最近食堂饭菜涨价",
    ]

How to fix it?

@MrRace
Copy link

MrRace commented Apr 11, 2023

@MrRace @XinliYu #199 here's PR for inference code

@wade3han Does it support multi-turn dialogue? Thanks a lot!

@FeiWard
Copy link

FeiWard commented Apr 14, 2023

@wade3han Thanks a lot for your work.I meet an error like below when do inference:

    transition_scores = model.compute_transition_scores(
  File "/opt/python3.10.11/lib/python3.10/site-packages/transformers/generation/utils.py", line 1051, in compute_transition_scores
    indices = sequences[:, cut_idx:] + beam_sequence_indices
RuntimeError: The size of tensor a (114) must match the size of tensor b (259) at non-singleton dimension 1

I just use

instructions= [
        "模仿鲁迅的风格, 吐槽一下最近食堂饭菜涨价",
    ]

How to fix it?

I have the same problem, but I don't know why he showed up. This problem is solved when I comment out line 3.

@wade3han
Copy link

I encountered same issue when I used different instructions.

Actually the lines _alpaca/blob/64100b4139545818b578cde7410b0ea66a62de9e/inference.py#L93-L103 are not necessary for inference (just for debugging), you can just comment out them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests