add dialoGPT-medium #65

AnneBeyer · 2021-05-27T09:12:19Z

Model

DialoGPT-medium extends GPT-2-medium by fine-tuning on Reddit data in order to model dialogue.
For this, the eos token is used to mark a speaker change (represented by the [SEP] token in the input, which requires some modifications to get_surprisals.py and tokenizer.py).

@inproceedings{zhang-etal-2020-dialogpt,
    title = "{DIALOGPT} : Large-Scale Generative Pre-training for Conversational Response Generation",
    author = "Zhang, Yizhe  and
      Sun, Siqi  and
      Galley, Michel  and
      Chen, Yen-Chun  and
      Brockett, Chris  and
      Gao, Xiang  and
      Gao, Jianfeng  and
      Liu, Jingjing  and
      Dolan, Bill",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations",
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-demos.30",
    doi = "10.18653/v1/2020.acl-demos.30",
    pages = "270--278"
}

Are you the creator/co-creator of this language model? No.
Are you the creator/co-creator of this implementation of this language
model? No.
Is this implementation the official implementation of the language model? Yes. (see https://huggingface.co/microsoft/DialoGPT-medium)
What licensing restrictions (if any) apply to this implementation of this
language model? MIT License, Copyright (c) Microsoft Corporation.

Training

What corpus was this model trained on?
147M conversation-like exchanges extracted from Reddit comment chains over a period spanning from 2005 through 2017, in total 1.8 billion words, vocabulary of 50,257 (see Zhang et al. (2020) for details)
What task was this model trained on?
It extends the Hugging Face PyTorch transformer with a next word prediction objective with an additional Mutual Information Maximization objective
If possible, provide some standard performance measures (e.g. test perplexity)
and complexity measures (e.g. parameter count, number of layers, etc.).
Parameters: 345M, Layers: 24, Embedding size: 1024, Batch size: 64
See Tables 2 & 3 in Zhang et al. (2020) for performance measures

Licensing

MIT License

add dialoGPT-medium

a2d7db6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add dialoGPT-medium #65

add dialoGPT-medium #65

AnneBeyer commented May 27, 2021

add dialoGPT-medium #65

Are you sure you want to change the base?

add dialoGPT-medium #65

Conversation

AnneBeyer commented May 27, 2021

Model

Training

Licensing