Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TinyBERT training: knowledge distillation vs training from scratch on MS-MARCO #85

Open
prasadkawthekar opened this issue Jul 17, 2020 · 0 comments

Comments

@prasadkawthekar
Copy link

prasadkawthekar commented Jul 17, 2020

Hi, thank you for the amazing work with NBoost.

My question is regarding TinyBERT. As per this accompanying blog post, TinyBERT is obtained using knowledge distillation on a larger BERT architecture that is pre-trained on MS-MARCO.

How does this approach compare with training a TinyBERT architecture from scratch on the MS-MARCO dataset?

@prasadkawthekar prasadkawthekar changed the title Question regarding training methodology for TinyBERT TinyBERT training: knowledge distillation vs training from scratch on MS-MARCO Jul 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant