Request for Scripts to Merge QDoRA Adapters with Base Model for vLLM Inference #60

iseesaw · 2024-04-25T07:23:52Z

Hello,

I've successfully finetuned Llama-3 8B with QDoRA and am now looking to perform inference using vLLM. Could you provide guidance or scripts on how to merge the QDoRA adapters with the original base model? Additionally, does this process involve quantization and dequantization of the base model?

Thank you!

iseesaw · 2024-04-25T12:12:53Z

I modified the merge code in Converting the State Dict.ipynb, where I replace lora with dora.

And then I merge the qlora adapter with the base model:

    config = PeftConfig.from_pretrained(PEFT_MODEL)
    model = AutoModelForCausalLM.from_pretrained(
        config.base_model_name_or_path,
        return_dict=True,
        # quantization_config=bnb_config,
        device_map="auto",
        torch_dtype=torch.bfloat16,
        # trust_remote_code=True,
    )

    model = PeftModel.from_pretrained(model, PEFT_MODEL)


    #Merge the adapter with the base model
    model = model.merge_and_unload()

    #Save the merged model in a directory "./naive_merge/" in the safetensors format
    model.save_pretrained(PEFT_MODEL + "-merged", safe_serialization=True)

    tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
    tokenizer.save_pretrained(PEFT_MODEL + "-merged")

But I got repeated response like

\nE. It is a complication of the disease\nF. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\nG. It is a disease of the disease\n

I want to know where the problem occurs.? Fine-tune or weights merge?

pe-hy · 2024-04-25T14:24:32Z

See my #57 also. Similar question/request.

lochuynh1412 · 2024-06-18T18:41:55Z

This is kind of working for me. We need to convert dora name to lora name in the tensor_dict.
After getting the lora adapter, we can do normal merging to get the final model.

import torch
from peft import LoraConfig, TaskType, get_peft_config, get_peft_model
from safetensors import safe_open
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    LlamaForCausalLM,
)

tensors = {}
with safe_open(
    "model_state_dict.safetensors",
    framework="pt",
    device=0,
) as f:
    for k in f.keys():
        tensors[k] = f.get_tensor(k)  # loads the full tensor given a key
        # print(k, tensors[k].dtype, tensors[k].shape) # Uncomment to view

new_tensors = {}
for _k in tensors:
    if "dora" not in _k:
        continue
    else:
        k = "base_model.model." + _k
        k = k.replace(".dora_layer", "")
        k = k.replace(".weight", ".default.weight")
        new_tensors[k] = tensors[_k]

tensors = new_tensors

# Make sure the compute type, target modules, rank, alpha etc match!
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=False,
    bnb_4bit_compute_dtype=torch.bfloat16,
)
model = LlamaForCausalLM.from_pretrained(
    "mistralai/Mistral-7B-Instruct-v0.3",
    use_cache=False,
    quantization_config=bnb_config,
)

tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.3")

# Freeze
for param in model.parameters():
    param.requires_grad = False

# Add LoRA (make sure your rank (r) and alpha (lora_alpha) values match those used in training!)
peft_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    inference_mode=False,
    r=64,
    lora_alpha=16,
    lora_dropout=0.1,
    # target_modules=['q_proj','k_proj','v_proj','o_proj','gate_proj','up_proj','down_proj','lm_head']
    target_modules=["k_proj", "q_proj", "v_proj", "up_proj", "down_proj", "gate_proj"],
)
model = get_peft_model(model, peft_config)

# Check out the first few keys in the state dict:
print(list(model.state_dict().keys())[:10])

new_sd = model.state_dict()
for k in new_sd:
    if "lora" in k:
        new_sd[k] = tensors[k]

model.load_state_dict(new_sd, strict=False)
model.save_pretrained("lora_adapters")
tokenizer.save_pretrained("lora_adapters")

williambarberjr · 2024-07-13T22:59:12Z

@lochuynh1412 how's the quality of the merged model?

mrgohlke mentioned this issue Oct 19, 2024

Out of memory while ConvertingTheStateDict - How to split across GPUs? #73

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request for Scripts to Merge QDoRA Adapters with Base Model for vLLM Inference #60

Request for Scripts to Merge QDoRA Adapters with Base Model for vLLM Inference #60

iseesaw commented Apr 25, 2024

iseesaw commented Apr 25, 2024

pe-hy commented Apr 25, 2024

lochuynh1412 commented Jun 18, 2024 •

edited

Loading

williambarberjr commented Jul 13, 2024

Request for Scripts to Merge QDoRA Adapters with Base Model for vLLM Inference #60

Request for Scripts to Merge QDoRA Adapters with Base Model for vLLM Inference #60

Comments

iseesaw commented Apr 25, 2024

iseesaw commented Apr 25, 2024

pe-hy commented Apr 25, 2024

lochuynh1412 commented Jun 18, 2024 • edited Loading

williambarberjr commented Jul 13, 2024

lochuynh1412 commented Jun 18, 2024 •

edited

Loading