Converting models from GPT-NeoX to HuggingFace format #33

sleekmike · 2022-08-09T23:40:01Z

Hello,

I am interested and volunteering to convert the models from GPT-NeoX to HuggingFace format.

VHellendoorn · 2022-08-09T23:52:52Z

Hi, that's great to hear. The basic steps should be the following:

Download a checkpoint and convert it to the HuggingFace format. This PR contains a file named convert_to_huggingface.py that did the job for the GPT-NeoX 20B model. It doesn't work out-of-the-box for ours, though, since some layers have different names and there are a few small architectural differences. I think it can be adapted by basically updating the tgt_state_dict and src_state_dict assignments based on the correct names of all layers in the PolyCoder models, which may involve a bit of trial by error.
At this point, you might actually be able to run the models directly by using code very similar to the fragment for NeoX but just (a) pointing it to the converted checkpoint directory, and (b) providing a transformers.GPTNeoXConfig that matches the checkpoint. I'd suggest trying a small model first.

Let me know if you have further questions or updates, and I hope it isn't too much trouble!

-Vincent

sleekmike · 2022-08-10T22:38:59Z

Thanks for your directions Vincent, I will follow them and if I run into issues or have questions, I will let you know.

sleekmike · 2022-08-12T01:44:55Z

Hello @VHellendoorn, Please what is the server spec that you used for training/inferencing the model? also if you could point me to the provider that you rented it from. Thanks

urialon · 2022-08-14T15:12:44Z

Hi @sleekmike ,

I'm sure that Vincent will respond soon, as far as I remember the models were trained on 8 Nvidia RTX 8000 GPUs on a single machine.

But I wonder why is that a concern? Don't you rather start with our smaller models that can fit any GPU?

sleekmike · 2022-08-14T17:09:55Z

@urialon Thanks for your response, I just want to get an idea, I will start with a much smaller model like you said.

NinedayWang · 2022-08-22T04:33:11Z

Hello everyone, I have tried to convert the GPT-NeoX model to HuggingFace format, and it has been verified that it can work properly on the 0.4B model, please refer to this PR.

urialon · 2022-08-23T12:41:38Z

Thanks a lot @NinedayWang !
Hopefully Vincent will be able to take a look soon.

@VHellendoorn what are the next needed steps? do we need to push code to huggingface, or the weights are usable with a GPTNeoXForCausalLM class?

VHellendoorn · 2022-08-23T13:48:03Z

Yes, thanks @NinedayWang! I'll try it out as soon as I have some time.

In terms of next steps: if this just works with the HF classes, which it sounds like it does, the next step is to just add a how-to to the README and clean up #34 a bit to work for all published models (the vocabulary will be identical, the script probably too, so just the distilled config file needs to be replicated).

urialon · 2022-09-01T20:22:10Z

if this just works with the HF classes

@VHellendoorn , note that it depends on @NinedayWang 's PR to huggingface/transformers to be merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Converting models from GPT-NeoX to HuggingFace format #33

Converting models from GPT-NeoX to HuggingFace format #33

sleekmike commented Aug 9, 2022

VHellendoorn commented Aug 9, 2022

sleekmike commented Aug 10, 2022

sleekmike commented Aug 12, 2022

urialon commented Aug 14, 2022

sleekmike commented Aug 14, 2022

NinedayWang commented Aug 22, 2022

urialon commented Aug 23, 2022

VHellendoorn commented Aug 23, 2022

urialon commented Sep 1, 2022

Converting models from GPT-NeoX to HuggingFace format #33

Converting models from GPT-NeoX to HuggingFace format #33

Comments

sleekmike commented Aug 9, 2022

VHellendoorn commented Aug 9, 2022

sleekmike commented Aug 10, 2022

sleekmike commented Aug 12, 2022

urialon commented Aug 14, 2022

sleekmike commented Aug 14, 2022

NinedayWang commented Aug 22, 2022

urialon commented Aug 23, 2022

VHellendoorn commented Aug 23, 2022

urialon commented Sep 1, 2022