training #3

armoreal · 2019-02-24T08:27:08Z

Is there's any way to train GPT2 using my own text corpus?

graykode · 2019-02-24T08:28:49Z

@armoreal Which language do you want? Is it English?

armoreal · 2019-02-24T08:31:44Z

In russian.

graykode · 2019-02-24T08:46:24Z

@armoreal
First, Existing gpt-2 models are only supported in English. openai/gpt-2#31
If you want to train your language, I recommend you to read original gpt, gpt-2 paper.
Please See Improving Language Understanding by Generative Pre-Training, 3-1. Unsupervised pre-training and 3-2. Supervised fine-tuning!
https://github.com/eukaryote31/openwebtext In here, you can also see GPT-2 WebText dataset.

armoreal · 2019-02-24T09:07:49Z

Thanks for your reply.
As far as i understand, GPT2 were trained on english and that's the reason why it doesn't support other languages, but I'd like to try to train it on other languages using my own dataset. OpenAI reply about training: openai/gpt-2#19 So it's possible, but they didn't planning to release the code yet.

graykode · 2019-02-24T10:47:49Z

@armoreal I think this repository can be trainable https://github.com/openai/finetune-transformer-lm
but, There are no dataset related to your langauge and computer resource I think..
In gpt-2 paper, they explained what is different gpt between gpt-2.
It will be problem at training, dataset(including how they pre-processing) and computer computer

graykode · 2019-02-24T10:51:09Z

@armoreal
See code and paper more detail

Text-Predict in here : https://github.com/openai/finetune-transformer-lm/blob/master/train.py#L176, 3.1 Unsupervisedpre-training
Task classification in here : https://github.com/openai/finetune-transformer-lm/blob/master/train.py#L193, 3.2 Supervisedﬁne-tuning

L3(C) = L2(C) + λ∗L1(C)
https://github.com/openai/finetune-transformer-lm/blob/master/train.py#L205

graykode · 2019-02-24T10:55:21Z

Overall, There is code related with training. so you can train.
BUT Dataset and Computer power maybe problem :(

Please do not close this issue for everyone!

guotong1988 · 2019-02-25T22:16:28Z

Same question. Thank you.

robertmacyiii · 2019-04-25T17:27:31Z

Is there a way to finetune this GPT-2 implementation on my own English corpus?

radiodee1 · 2019-06-01T19:49:42Z

I would like to fine tune pytorch gpt2 on an English corpus. Is the openai code pytorch or tf? Are there examples online in pytorch?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

training #3

training #3

armoreal commented Feb 24, 2019

graykode commented Feb 24, 2019

armoreal commented Feb 24, 2019

graykode commented Feb 24, 2019 •

edited

Loading

armoreal commented Feb 24, 2019

graykode commented Feb 24, 2019 •

edited

Loading

graykode commented Feb 24, 2019 •

edited

Loading

graykode commented Feb 24, 2019 •

edited

Loading

guotong1988 commented Feb 25, 2019

robertmacyiii commented Apr 25, 2019

radiodee1 commented Jun 1, 2019 •

edited

Loading

training #3

training #3

Comments

armoreal commented Feb 24, 2019

graykode commented Feb 24, 2019

armoreal commented Feb 24, 2019

graykode commented Feb 24, 2019 • edited Loading

armoreal commented Feb 24, 2019

graykode commented Feb 24, 2019 • edited Loading

graykode commented Feb 24, 2019 • edited Loading

graykode commented Feb 24, 2019 • edited Loading

guotong1988 commented Feb 25, 2019

robertmacyiii commented Apr 25, 2019

radiodee1 commented Jun 1, 2019 • edited Loading

graykode commented Feb 24, 2019 •

edited

Loading

graykode commented Feb 24, 2019 •

edited

Loading

graykode commented Feb 24, 2019 •

edited

Loading

graykode commented Feb 24, 2019 •

edited

Loading

radiodee1 commented Jun 1, 2019 •

edited

Loading