-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request for Scripts to Merge QDoRA Adapters with Base Model for vLLM Inference #60
Comments
I modified the merge code in And then I merge the qlora adapter with the base model: config = PeftConfig.from_pretrained(PEFT_MODEL)
model = AutoModelForCausalLM.from_pretrained(
config.base_model_name_or_path,
return_dict=True,
# quantization_config=bnb_config,
device_map="auto",
torch_dtype=torch.bfloat16,
# trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, PEFT_MODEL)
#Merge the adapter with the base model
model = model.merge_and_unload()
#Save the merged model in a directory "./naive_merge/" in the safetensors format
model.save_pretrained(PEFT_MODEL + "-merged", safe_serialization=True)
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
tokenizer.save_pretrained(PEFT_MODEL + "-merged") But I got repeated response like
I want to know where the problem occurs.? Fine-tune or weights merge? |
See my #57 also. Similar question/request. |
This is kind of working for me. We need to convert dora name to lora name in the tensor_dict. import torch
from peft import LoraConfig, TaskType, get_peft_config, get_peft_model
from safetensors import safe_open
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
BitsAndBytesConfig,
LlamaForCausalLM,
)
tensors = {}
with safe_open(
"model_state_dict.safetensors",
framework="pt",
device=0,
) as f:
for k in f.keys():
tensors[k] = f.get_tensor(k) # loads the full tensor given a key
# print(k, tensors[k].dtype, tensors[k].shape) # Uncomment to view
new_tensors = {}
for _k in tensors:
if "dora" not in _k:
continue
else:
k = "base_model.model." + _k
k = k.replace(".dora_layer", "")
k = k.replace(".weight", ".default.weight")
new_tensors[k] = tensors[_k]
tensors = new_tensors
# Make sure the compute type, target modules, rank, alpha etc match!
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=False,
bnb_4bit_compute_dtype=torch.bfloat16,
)
model = LlamaForCausalLM.from_pretrained(
"mistralai/Mistral-7B-Instruct-v0.3",
use_cache=False,
quantization_config=bnb_config,
)
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.3")
# Freeze
for param in model.parameters():
param.requires_grad = False
# Add LoRA (make sure your rank (r) and alpha (lora_alpha) values match those used in training!)
peft_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
inference_mode=False,
r=64,
lora_alpha=16,
lora_dropout=0.1,
# target_modules=['q_proj','k_proj','v_proj','o_proj','gate_proj','up_proj','down_proj','lm_head']
target_modules=["k_proj", "q_proj", "v_proj", "up_proj", "down_proj", "gate_proj"],
)
model = get_peft_model(model, peft_config)
# Check out the first few keys in the state dict:
print(list(model.state_dict().keys())[:10])
new_sd = model.state_dict()
for k in new_sd:
if "lora" in k:
new_sd[k] = tensors[k]
model.load_state_dict(new_sd, strict=False)
model.save_pretrained("lora_adapters")
tokenizer.save_pretrained("lora_adapters") |
@lochuynh1412 how's the quality of the merged model? |
Hello,
I've successfully finetuned Llama-3 8B with QDoRA and am now looking to perform inference using vLLM. Could you provide guidance or scripts on how to merge the QDoRA adapters with the original base model? Additionally, does this process involve quantization and dequantization of the base model?
Thank you!
The text was updated successfully, but these errors were encountered: