Skip to content
This repository has been archived by the owner on Jun 7, 2023. It is now read-only.

add dialoGPT-medium #65

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

add dialoGPT-medium #65

wants to merge 1 commit into from

Conversation

AnneBeyer
Copy link

Model

DialoGPT-medium extends GPT-2-medium by fine-tuning on Reddit data in order to model dialogue.
For this, the eos token is used to mark a speaker change (represented by the [SEP] token in the input, which requires some modifications to get_surprisals.py and tokenizer.py).

@inproceedings{zhang-etal-2020-dialogpt,
    title = "{DIALOGPT} : Large-Scale Generative Pre-training for Conversational Response Generation",
    author = "Zhang, Yizhe  and
      Sun, Siqi  and
      Galley, Michel  and
      Chen, Yen-Chun  and
      Brockett, Chris  and
      Gao, Xiang  and
      Gao, Jianfeng  and
      Liu, Jingjing  and
      Dolan, Bill",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations",
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-demos.30",
    doi = "10.18653/v1/2020.acl-demos.30",
    pages = "270--278"
}
  • Are you the creator/co-creator of this language model? No.
  • Are you the creator/co-creator of this implementation of this language
    model?
    No.
  • Is this implementation the official implementation of the language model? Yes. (see https://huggingface.co/microsoft/DialoGPT-medium)
  • What licensing restrictions (if any) apply to this implementation of this
    language model?
    MIT License, Copyright (c) Microsoft Corporation.

Training

  • What corpus was this model trained on?
    147M conversation-like exchanges extracted from Reddit comment chains over a period spanning from 2005 through 2017, in total 1.8 billion words, vocabulary of 50,257 (see Zhang et al. (2020) for details)

  • What task was this model trained on?
    It extends the Hugging Face PyTorch transformer with a next word prediction objective with an additional Mutual Information Maximization objective

  • If possible, provide some standard performance measures (e.g. test perplexity)
    and complexity measures (e.g. parameter count, number of layers, etc.).

    Parameters: 345M, Layers: 24, Embedding size: 1024, Batch size: 64
    See Tables 2 & 3 in Zhang et al. (2020) for performance measures

Licensing

MIT License

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant