You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My question is regarding TinyBERT. As per this accompanying blog post, TinyBERT is obtained using knowledge distillation on a larger BERT architecture that is pre-trained on MS-MARCO.
How does this approach compare with training a TinyBERT architecture from scratch on the MS-MARCO dataset?
The text was updated successfully, but these errors were encountered:
prasadkawthekar
changed the title
Question regarding training methodology for TinyBERT
TinyBERT training: knowledge distillation vs training from scratch on MS-MARCO
Jul 17, 2020
Hi, thank you for the amazing work with NBoost.
My question is regarding TinyBERT. As per this accompanying blog post, TinyBERT is obtained using knowledge distillation on a larger BERT architecture that is pre-trained on MS-MARCO.
How does this approach compare with training a TinyBERT architecture from scratch on the MS-MARCO dataset?
The text was updated successfully, but these errors were encountered: