Skip to content

Latest commit

 

History

History
66 lines (39 loc) · 3.7 KB

README.md

File metadata and controls

66 lines (39 loc) · 3.7 KB

nlp-paper-summary

This repo contains the summary of following NLP papers.

BERT-vs-LSTM TinyBERT ALBERT Poor Man's BERT SpanBERT

BERT-vs-LSTM

This paper talks about:

Give a small dataset, can we use a large pre-trained model like BERT and get better results than simple models?

Checkout the summary here

TinyBERT: Distilling BERT for Natural Language Understanding

This paper talks about:

BERT based models are usually computationally expensive and memory intensive, so it is difficult to effectively execute them on resource-restricted devices. How to reduce the size while keeping the performance drop to minimum?

TinyBERT is empirically effective and achieves more than 96% the performance of teacher BERT(base) on GLUE benchmark, while being 7.5x smaller and 9.4x faster on inference.

Checkout the summary here

ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations

This paper talks about:

Increasing model size while pretraining natural language representations often results in improved performance on downstream tasks. However, at some point further model increases become harder due to GPU/TPU memory limitations and longer training times. How to handle these issues?

ALBERT proposes two parameter reduction techniques to lower memory consumption and increase the training speed of BERT. It also proposes a self-supervised loss that focuses on modeling inter-sentence coherence, and show it consistently helps downstream tasks with multi-sentence inputs.

Checkout the summary here

Poor Man's BERT: Smaller and Faster Transformer Models

This paper talks about:

NLP has recently been dominated by large-scale pre-trained Transformer models, where size does matter. Models such as BERT, XLNet, RoBERTa, etc. are now out of reach for researchers and practitioners without large GPUs/TPUs. How to reduce the model size that do not require model pretraining from scratch?

There are many ways to reduce the size of pre-trained models. Some notable approaches are:

  • Prune parts of the network after training
  • Reduction through weight factorization and sharing (Albert)
  • Compression through knowledge distillation (Distilbert, Tinybert)
  • Quantization (Q-bert)

This work falls under the class of pruning methods.This paper question's whether it is necessary to use all layers of a pre-trained model in downstream tasks and propose straight-forward strategies to drop some layers from the neural network.

Checkout the summary here

SpanBERT: Improving Pre-training by Representing and Predicting Spans

This paper talks about:

Pre-training objective plays an important role in learning the representations of language. BERT's pretraining objective contains 2 parts: Masked Language Modelling (MLM) and Next Sentence Prediction (NSP). This paper proposes new pre-training obejective which can better encode the sentences.

Checkout the summary here