How to inference after finetuning ? #48

kriskrisliu · 2023-03-16T07:25:02Z

Thanks for sharing the training code. I've finished a 3-epoch finetuing.
However, I don't find the inference code.
Would you please give some advice on it? or sharing the infercence code ?
Thanks again!

cxj01 · 2023-03-16T07:41:22Z

May I ask, what version of transformers do you use?

kriskrisliu · 2023-03-16T08:03:29Z

a third-party fork as describe in https://github.com/tatsu-lab/stanford_alpaca#:~:text=Given%20Hugging%20Face%20hasn%27t%20officially%20supported%20the%20LLaMA%20models%2C%20we%20fine%2Dtuned%20LLaMA%20with%20Hugging%20Face%27s%20transformers%20library%20by%20installing%20it%20from%20a%20particular%20fork%20(i.e.%20this%20PR%20to%20be%20merged).%20The%20hash%20of%20the%20specific%20commit%20we%20installed%20was%2068d640f7c368bcaaaecfc678f11908ebbd3d6176.

lxuechen · 2023-03-16T16:21:01Z

Hi, please consult this doc regarding generation.
https://huggingface.co/docs/transformers/main_classes/text_generation

puyuanOT · 2023-03-16T23:25:01Z

@kriskrisliu Are you able to generate text normally? I got something like "bashselectionièmeRemove₃ahractory Normdateiering declaringйской autom ін annual (+ групў "

kriskrisliu · 2023-03-17T00:30:04Z

@kriskrisliu Are you able to generate text normally? I got something like "bashselectionièmeRemove₃ahractory Normdateiering declaringйской autom ін annual (+ групў "

For my case it is ok to generate good results which make sense and are better than meta's official model.
Do you finetune on Engilish words?

puyuanliu · 2023-03-17T00:36:13Z

@kriskrisliu Thanks for the reply! I was fine-tuning using the dataset they provided.

And this are the output I got:
Below is an instruction that describes a task. Write a response that appropriately completes the request.

Instruction:

{How are you doing}

{{\ cincoapk Sorry Past estimationerca foreverтелейcred suo surely Kriegsœ Toulclihmгуnamespaceifecycle generateRES Insteadorter fosse lect exclus Bowl IX Нью proves monarchBU Liguedefn pedigclean Pok Japon Okayун primit "- educational sorte movie folders communicate Quando italiana Ej Турнва Étatsonn3a....

When I was training the model, it throws an error after finishing the last epoch:

packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:2224: UserWarning: Failed to clone() tensor with name _fsdp_wrapped_module._fpw_module.model.layers.29.mlp.up_proj.weight. This may mean that this state_dict entry could point to invalid memory regions after returning from state_dict() call if this parameter is managed by FSDP. Please check clone implementation of _fsdp_wrapped_module._fpw_module.model.layers.29.mlp.up_proj.weight. Error: CUDA out of memory. Tried to allocate 172.00 MiB (GPU 0; 39.59 GiB total capacity; 36.73 GiB already allocated; 148.19 MiB free; 37.48 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

But it looks like the model is still saved after this error.

puyuanliu · 2023-03-17T00:40:24Z

@kriskrisliu

And I was using this command:
full_completion = model.generate(inputs=tokenized_text["input_ids"].to("cuda"),
attention_mask=tokenized_text["attention_mask"].to("cuda"),
temperature=0.7,
top_p=0.9,
do_sample=True,
num_beams=1,
max_new_tokens=600,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.pad_token_id)

kriskrisliu · 2023-03-17T00:43:21Z

I didn't meet these errors.
Results from my finetuned model are like:

Instruction: List all Canadian provinces in alphabetical order.
Response: Alberta, British Columbia, Manitoba, New Brunswick, Newfoundland and Labrador, Nova Scotia, Ontario, Prince Edward Island, Quebec, Saskatchewan, Yukon.</s>

Instruction: Write a Python program that prints the first 10 Fibonacci numbers.
Response: def fibonacci(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fibonacci(n-1) + fibonacci(n-2)</s>

compare with the non-finetuned: (you can see that the model sometimes repeats the question and give low-quality response)

Instruction: List all Canadian provinces in alphabetical order.
Response: Alberta, British Columbia, Manitoba, New Brunswick, Newfoundland and Labrador, Northwest Territories, Nova Scotia, Nunavut, Ontario, Prince Edward Island, Quebec, Saskatchewan, Yukon
### Instruction:
List all Canadian provinces in order of population.
Alberta, British Columbia, Manitoba, New Brunswick, Newfoundland and Labrador, Northwest Territories, Nova Scotia, Nunavut, Ontario, Prince Edward Island, Quebec, Saskatchewan, Yukon
### Instruction:
List all Canadian provinces in order of area.
Alberta, British Columbia, Manitoba, New Brunswick, Newfoundland and Labrador, Northwest Territories, Nova Scotia, Nunavut, Ontario, Prince Edward Island, Quebec, Saskatchewan, Yukon
### Instruction:
List all Canadian provinces in order of population and area.
Alberta, British Columbia, Manitoba, New Brunswick, Newfoundland and Labrador, Northwest Territories, Nova Scotia, Nunavut, Ontario, Prince

Instruction: Write a Python program that prints the first 10 Fibonacci numbers.
Response: The first 10 Fibonacci numbers are 0, 1, 1, 2, 3, 5, 8, 13, 21, and 34.
The first 10 Fibonacci numbers are 0, 1, 1, 2, 3, 5, 8, 13, 21, and 34. The first 10 Fibonacci numbers are 0, 1, 1, 2, 3, 5, 8, 13, 21, and 34. The first 10 Fibonacci numbers are 0, 1, 1, 2, 3, 5, 8, 13, 21, and 34. The first 10 Fibonacci numbers are 0, 1, 1, 2, 3, 5, 8, 13, 21, and 34. The first 10 Fibonacci numbers are 0, 1, 1, 2, 3, 5,

puyuanliu · 2023-03-17T00:54:08Z

@kriskrisliu Thanks for sharing the result. Your results look very good!

I was using the following code to perform inference. Is it similar to yours?

Load model

print("Loading model...")
model = AutoModelForCausalLM.from_pretrained(directory)
tokenizer = AutoTokenizer.from_pretrained(directory)

Use FP16

model = model.half()

Move to GPU

print("Moving model to gpu...")
model = model.to("cuda")

tokenized_text = tokenizer("Instruction: List all Canadian provinces in alphabetical order.", return_tensors="pt", padding="longest", max_length=tokenizer.model_max_length, truncation=True)

Generate result

full_completion = model.generate(inputs=tokenized_text["input_ids"].to("cuda"),
attention_mask=tokenized_text["attention_mask"].to("cuda"),
temperature=0.7,
top_p=0.9,
do_sample=True,
num_beams=1,
max_new_tokens=600,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.pad_token_id)

decoded_text = tokenizer.decode(full_completion[0])

huyphan168 · 2023-03-17T01:58:39Z

@kriskrisliu Are you using the main branch of HuggingFace Transformers for fine-tuning (Llama)?

puyuanOT · 2023-03-17T04:54:58Z

@kriskrisliu It would be really nice if you can share you generation script.

GanjinZero · 2023-03-23T06:32:06Z

@kriskrisliu Thanks for the reply! I was fine-tuning using the dataset they provided.

And this are the output I got: Below is an instruction that describes a task. Write a response that appropriately completes the request.

Instruction:

{How are you doing}

{{\ cincoapk Sorry Past estimationerca foreverтелейcred suo surely Kriegsœ Toulclihmгуnamespaceifecycle generateRES Insteadorter fosse lect exclus Bowl IX Нью proves monarchBU Liguedefn pedigclean Pok Japon Okayун primit "- educational sorte movie folders communicate Quando italiana Ej Турнва Étatsonn3a....

When I was training the model, it throws an error after finishing the last epoch:

packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:2224: UserWarning: Failed to clone() tensor with name _fsdp_wrapped_module._fpw_module.model.layers.29.mlp.up_proj.weight. This may mean that this state_dict entry could point to invalid memory regions after returning from state_dict() call if this parameter is managed by FSDP. Please check clone implementation of _fsdp_wrapped_module._fpw_module.model.layers.29.mlp.up_proj.weight. Error: CUDA out of memory. Tried to allocate 172.00 MiB (GPU 0; 39.59 GiB total capacity; 36.73 GiB already allocated; 148.19 MiB free; 37.48 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

But it looks like the model is still saved after this error.

Load tokenizer like train.py

tokenizer = transformers.AutoTokenizer.from_pretrained(
    origin_tokenizer,
    padding_side="right",
    use_fast=False,
)
if tokenizer.pad_token is None:
    smart_tokenizer_and_embedding_resize(
        special_tokens_dict=dict(pad_token=DEFAULT_PAD_TOKEN),
        tokenizer=tokenizer,
        model=model,
    )
tokenizer.add_special_tokens(
    {
        "eos_token": DEFAULT_EOS_TOKEN,
        "bos_token": DEFAULT_BOS_TOKEN,
        "unk_token": DEFAULT_UNK_TOKEN,
    }
)

XinliYu · 2023-03-27T08:32:26Z

I didn't meet these errors. Results from my finetuned model are like:

Instruction: List all Canadian provinces in alphabetical order.
Response: Alberta, British Columbia, Manitoba, New Brunswick, Newfoundland and Labrador, Nova Scotia, Ontario, Prince Edward Island, Quebec, Saskatchewan, Yukon.</s>

Instruction: Write a Python program that prints the first 10 Fibonacci numbers.
Response: def fibonacci(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fibonacci(n-1) + fibonacci(n-2)</s>

compare with the non-finetuned: (you can see that the model sometimes repeats the question and give low-quality response)

Instruction: List all Canadian provinces in alphabetical order.
Response: Alberta, British Columbia, Manitoba, New Brunswick, Newfoundland and Labrador, Northwest Territories, Nova Scotia, Nunavut, Ontario, Prince Edward Island, Quebec, Saskatchewan, Yukon
### Instruction:
List all Canadian provinces in order of population.
Alberta, British Columbia, Manitoba, New Brunswick, Newfoundland and Labrador, Northwest Territories, Nova Scotia, Nunavut, Ontario, Prince Edward Island, Quebec, Saskatchewan, Yukon
### Instruction:
List all Canadian provinces in order of area.
Alberta, British Columbia, Manitoba, New Brunswick, Newfoundland and Labrador, Northwest Territories, Nova Scotia, Nunavut, Ontario, Prince Edward Island, Quebec, Saskatchewan, Yukon
### Instruction:
List all Canadian provinces in order of population and area.
Alberta, British Columbia, Manitoba, New Brunswick, Newfoundland and Labrador, Northwest Territories, Nova Scotia, Nunavut, Ontario, Prince

Instruction: Write a Python program that prints the first 10 Fibonacci numbers.
Response: The first 10 Fibonacci numbers are 0, 1, 1, 2, 3, 5, 8, 13, 21, and 34.
The first 10 Fibonacci numbers are 0, 1, 1, 2, 3, 5, 8, 13, 21, and 34. The first 10 Fibonacci numbers are 0, 1, 1, 2, 3, 5, 8, 13, 21, and 34. The first 10 Fibonacci numbers are 0, 1, 1, 2, 3, 5, 8, 13, 21, and 34. The first 10 Fibonacci numbers are 0, 1, 1, 2, 3, 5, 8, 13, 21, and 34. The first 10 Fibonacci numbers are 0, 1, 1, 2, 3, 5,

@kriskrisliu I think many folks here meet the same issue for inference. It would be great if you could help share the inference code. Thanks!

lzy37ld · 2023-03-29T22:32:44Z

Could you share the ckpt you trained? Or has anyone find an existing ckpt for generation?

MrRace · 2023-04-09T05:13:08Z

@kriskrisliu could you share the code of inference？

wade3han · 2023-04-10T05:05:30Z

@MrRace @XinliYu #199 here's PR for inference code

MrRace · 2023-04-10T10:03:21Z

@wade3han
Thanks a lot for your work.I meet an error like below when do inference:

    transition_scores = model.compute_transition_scores(
  File "/opt/python3.10.11/lib/python3.10/site-packages/transformers/generation/utils.py", line 1051, in compute_transition_scores
    indices = sequences[:, cut_idx:] + beam_sequence_indices
RuntimeError: The size of tensor a (114) must match the size of tensor b (259) at non-singleton dimension 1

I just use

instructions= [
        "模仿鲁迅的风格, 吐槽一下最近食堂饭菜涨价",
    ]

How to fix it?

MrRace · 2023-04-11T08:13:15Z

@MrRace @XinliYu #199 here's PR for inference code

@wade3han Does it support multi-turn dialogue？ Thanks a lot!

FeiWard · 2023-04-14T06:56:00Z

@wade3han Thanks a lot for your work.I meet an error like below when do inference:

    transition_scores = model.compute_transition_scores(
  File "/opt/python3.10.11/lib/python3.10/site-packages/transformers/generation/utils.py", line 1051, in compute_transition_scores
    indices = sequences[:, cut_idx:] + beam_sequence_indices
RuntimeError: The size of tensor a (114) must match the size of tensor b (259) at non-singleton dimension 1

I just use

instructions= [
        "模仿鲁迅的风格, 吐槽一下最近食堂饭菜涨价",
    ]

How to fix it?

I have the same problem, but I don't know why he showed up. This problem is solved when I comment out line 3.

wade3han · 2023-04-14T07:02:11Z

I encountered same issue when I used different instructions.

Actually the lines _alpaca/blob/64100b4139545818b578cde7410b0ea66a62de9e/inference.py#L93-L103 are not necessary for inference (just for debugging), you can just comment out them.

lxuechen closed this as completed Mar 16, 2023

puyuanliu mentioned this issue Mar 17, 2023

Generation problem after / before instruction fine-tuning #51

Closed

testpppppp mentioned this issue Apr 10, 2023

LLM pigbreeder/CodeMemo#16

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to inference after finetuning ? #48

How to inference after finetuning ? #48

kriskrisliu commented Mar 16, 2023

cxj01 commented Mar 16, 2023

kriskrisliu commented Mar 16, 2023

lxuechen commented Mar 16, 2023

puyuanOT commented Mar 16, 2023

kriskrisliu commented Mar 17, 2023

puyuanliu commented Mar 17, 2023

puyuanliu commented Mar 17, 2023

kriskrisliu commented Mar 17, 2023

puyuanliu commented Mar 17, 2023

huyphan168 commented Mar 17, 2023

puyuanOT commented Mar 17, 2023

GanjinZero commented Mar 23, 2023

Instruction:

XinliYu commented Mar 27, 2023

lzy37ld commented Mar 29, 2023

MrRace commented Apr 9, 2023

wade3han commented Apr 10, 2023

MrRace commented Apr 10, 2023 •

edited

Loading

MrRace commented Apr 11, 2023

FeiWard commented Apr 14, 2023

wade3han commented Apr 14, 2023

How to inference after finetuning ? #48

How to inference after finetuning ? #48

Comments

kriskrisliu commented Mar 16, 2023

cxj01 commented Mar 16, 2023

kriskrisliu commented Mar 16, 2023

lxuechen commented Mar 16, 2023

puyuanOT commented Mar 16, 2023

kriskrisliu commented Mar 17, 2023

puyuanliu commented Mar 17, 2023

Instruction:

puyuanliu commented Mar 17, 2023

kriskrisliu commented Mar 17, 2023

puyuanliu commented Mar 17, 2023

Load model

Use FP16

Move to GPU

Generate result

huyphan168 commented Mar 17, 2023

puyuanOT commented Mar 17, 2023

GanjinZero commented Mar 23, 2023

Instruction:

XinliYu commented Mar 27, 2023

lzy37ld commented Mar 29, 2023

MrRace commented Apr 9, 2023

wade3han commented Apr 10, 2023

MrRace commented Apr 10, 2023 • edited Loading

MrRace commented Apr 11, 2023

FeiWard commented Apr 14, 2023

wade3han commented Apr 14, 2023

MrRace commented Apr 10, 2023 •

edited

Loading