RewardModel.from_pretrained() loads redundant weights (incurs extra ~30GB of RAM) #75

angie-chen55 · 2023-09-20T18:19:15Z

Hi,

Whenever a saved RewardModel is loaded via RewardModel.from_pretrained(model_path, flash_attn=True, fp16=False, bf16=True, low_cpu_mem_usage=True), it downloads the entire sharded checkpoint (https://github.com/huggingface/transformers/blob/v4.33.2/src/transformers/modeling_utils.py#L2876), which is already ~30GB because it contains all the weights of the reward model, including both the backbone model and the reward head. It then calls RewardModel.__init__() (via this line https://github.com/huggingface/transformers/blob/v4.33.2/src/transformers/modeling_utils.py#L2966), which loads all the weights of the backbone model (SFT10K, another ~30GB). Surely loading a pretrained model shouldn't require loading the backbone model weights twice?

Thanks!

The text was updated successfully, but these errors were encountered:

gpt4 annotations for vicuna v1.3

lolipopshock pushed a commit to lolipopshock/alpaca_farm that referenced this issue Sep 24, 2023

Merge pull request tatsu-lab#75 from tatsu-lab/rohan/vicuna-v1.3

e5da1ee

gpt4 annotations for vicuna v1.3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RewardModel.from_pretrained() loads redundant weights (incurs extra ~30GB of RAM) #75

RewardModel.from_pretrained() loads redundant weights (incurs extra ~30GB of RAM) #75

angie-chen55 commented Sep 20, 2023

RewardModel.from_pretrained() loads redundant weights (incurs extra ~30GB of RAM) #75

RewardModel.from_pretrained() loads redundant weights (incurs extra ~30GB of RAM) #75

Comments

angie-chen55 commented Sep 20, 2023