Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converting models from GPT-NeoX to HuggingFace format #33

Open
sleekmike opened this issue Aug 9, 2022 · 9 comments
Open

Converting models from GPT-NeoX to HuggingFace format #33

sleekmike opened this issue Aug 9, 2022 · 9 comments

Comments

@sleekmike
Copy link

Hello,

I am interested and volunteering to convert the models from GPT-NeoX to HuggingFace format.

@VHellendoorn
Copy link
Owner

Hi, that's great to hear. The basic steps should be the following:

  1. Download a checkpoint and convert it to the HuggingFace format. This PR contains a file named convert_to_huggingface.py that did the job for the GPT-NeoX 20B model. It doesn't work out-of-the-box for ours, though, since some layers have different names and there are a few small architectural differences. I think it can be adapted by basically updating the tgt_state_dict and src_state_dict assignments based on the correct names of all layers in the PolyCoder models, which may involve a bit of trial by error.
  2. At this point, you might actually be able to run the models directly by using code very similar to the fragment for NeoX but just (a) pointing it to the converted checkpoint directory, and (b) providing a transformers.GPTNeoXConfig that matches the checkpoint. I'd suggest trying a small model first.

Let me know if you have further questions or updates, and I hope it isn't too much trouble!

-Vincent

@sleekmike
Copy link
Author

Thanks for your directions Vincent, I will follow them and if I run into issues or have questions, I will let you know.

@sleekmike
Copy link
Author

Hello @VHellendoorn, Please what is the server spec that you used for training/inferencing the model? also if you could point me to the provider that you rented it from. Thanks

@urialon
Copy link
Collaborator

urialon commented Aug 14, 2022

Hi @sleekmike ,

I'm sure that Vincent will respond soon, as far as I remember the models were trained on 8 Nvidia RTX 8000 GPUs on a single machine.

But I wonder why is that a concern? Don't you rather start with our smaller models that can fit any GPU?

@sleekmike
Copy link
Author

@urialon Thanks for your response, I just want to get an idea, I will start with a much smaller model like you said.

@NinedayWang
Copy link
Contributor

Hello everyone, I have tried to convert the GPT-NeoX model to HuggingFace format, and it has been verified that it can work properly on the 0.4B model, please refer to this PR.

@urialon
Copy link
Collaborator

urialon commented Aug 23, 2022

Thanks a lot @NinedayWang !
Hopefully Vincent will be able to take a look soon.

@VHellendoorn what are the next needed steps? do we need to push code to huggingface, or the weights are usable with a GPTNeoXForCausalLM class?

@VHellendoorn
Copy link
Owner

Yes, thanks @NinedayWang! I'll try it out as soon as I have some time.

In terms of next steps: if this just works with the HF classes, which it sounds like it does, the next step is to just add a how-to to the README and clean up #34 a bit to work for all published models (the vocabulary will be identical, the script probably too, so just the distilled config file needs to be replicated).

@urialon
Copy link
Collaborator

urialon commented Sep 1, 2022

if this just works with the HF classes

@VHellendoorn , note that it depends on @NinedayWang 's PR to huggingface/transformers to be merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants