This repository has been archived by the owner on Jun 7, 2023. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Model
DialoGPT-medium extends GPT-2-medium by fine-tuning on Reddit data in order to model dialogue.
For this, the
eos token
is used to mark a speaker change (represented by the[SEP]
token in the input, which requires some modifications toget_surprisals.py
andtokenizer.py
).model? No.
language model? MIT License, Copyright (c) Microsoft Corporation.
Training
What corpus was this model trained on?
147M conversation-like exchanges extracted from Reddit comment chains over a period spanning from 2005 through 2017, in total 1.8 billion words, vocabulary of 50,257 (see Zhang et al. (2020) for details)
What task was this model trained on?
It extends the Hugging Face PyTorch transformer with a next word prediction objective with an additional Mutual Information Maximization objective
If possible, provide some standard performance measures (e.g. test perplexity)
and complexity measures (e.g. parameter count, number of layers, etc.).
Parameters: 345M, Layers: 24, Embedding size: 1024, Batch size: 64
See Tables 2 & 3 in Zhang et al. (2020) for performance measures
Licensing
MIT License