BERT系列魔改、搜索、剪枝、蒸馏方案

优化设计

预训练模型

Deep Contextualized Word Representations (NAACL 2018) [paper] - ELMo
Universal Language Model Fine-tuning for Text Classification (ACL 2018) [paper] - ULMFit
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (NAACL 2019) [paper][code][official PyTorch code] - BERT
Improving Language Understanding by Generative Pre-Training (CoRR 2018) [paper] - GPT
Language Models are Unsupervised Multitask Learners (CoRR 2019) [paper][code] - GPT-2
MASS: Masked Sequence to Sequence Pre-training for Language Generation (ICML 2019) [paper][code] - MASS
Unified Language Model Pre-training for Natural Language Understanding and Generation (CoRR 2019) [paper][code] - UNILM
Multi-Task Deep Neural Networks for Natural Language Understanding (ACL 2019) [paper][code] - MT-DNN
ERNIE: Enhanced Language Representation with Informative Entities (ACL 2019) [paper][code] - ERNIE (THU)
ERNIE: Enhanced Representation through Knowledge Integration (CoRR 2019) [paper] - ERNIE (Baidu)
ERNIE 2.0: A Continual Pre-training Framework for Language Understanding (CoRR 2019) [paper] - ERNIE 2.0 (Baidu)
Pre-Training with Whole Word Masking for Chinese BERT (CoRR 2019) [paper] - Chinese-BERT-wwm
SpanBERT: Improving Pre-training by Representing and Predicting Spans (CoRR 2019) [paper] - SpanBERT
XLNet: Generalized Autoregressive Pretraining for Language Understanding (CoRR 2019) [paper][code] - XLNet
RoBERTa: A Robustly Optimized BERT Pretraining Approach (CoRR 2019) [paper] - RoBERTa
NEZHA: Neural Contextualized Representation for Chinese Language Understanding (CoRR 2019) [paper][code] - NEZHA
K-BERT: Enabling Language Representation with Knowledge Graph (AAAI 2020) [paper][code] - K-BERT
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transforme (CoRR 2019) [paper][code] - T5
ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations (CoRR 2019) [paper][code] - ZEN
The JDDC Corpus: A Large-Scale Multi-Turn Chinese Dialogue Dataset for E-commerce Customer Service (CoRR 2019) [paper][code] - BAAI-JDAI-BERT
Knowledge Enhanced Contextual Word Representations (EMNLP 2019) [paper] - KnowBert
UER: An Open-Source Toolkit for Pre-training Models (EMNLP 2019) [paper][code] - UER
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators (ICLR 2020) [paper] - ELECTRA
StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding (ICLR 2020) [paper] - StructBERT
FreeLB: Enhanced Adversarial Training for Language Understanding (ICLR 2020) [paper][code] - FreeLB
HUBERT Untangles BERT to Improve Transfer across NLP Tasks (CoRR 2019) [paper] - HUBERT
CodeBERT: A Pre-Trained Model for Programming and Natural Languages (CoRR 2020) [paper] - CodeBERT
ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training (CoRR 2020) [paper] - ProphetNet
ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation (CoRR 2020) [paper][code] - ERNIE-GEN
Efficient Training of BERT by Progressively Stacking (ICML 2019) [paper][code] - StackingBERT
PoWER-BERT: Accelerating BERT Inference via Progressive Word-vector Elimination (CoRR 2020) [paper][code]
UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training (CoRR 2020) [paper][code] - UNILMv2
MPNet: Masked and Permuted Pre-training for Language Understanding (CoRR 2020) [paper][code] - MPNet
Language Models are Few-Shot Learners (CoRR 2020) [paper][code] - GPT-3
SPECTER: Document-level Representation Learning using Citation-informed Transformers (ACL 2020) [paper] - SPECTER
PLATO-2: Towards Building an Open-Domain Chatbot via Curriculum Learning (CoRR 2020) [paper][code] - PLATO-2
DeBERTa: Decoding-enhanced BERT with Disentangled Attention (CoRR 2020) [paper][code] - DeBERTa

多模态

VideoBERT: A Joint Model for Video and Language Representation Learning (ICCV 2019) [paper]
Learning Video Representations using Contrastive Bidirectional Transformer (CoRR 2019) [paper] - CBT
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks (NeurIPS 2019) [paper][code]
VisualBERT: A Simple and Performant Baseline for Vision and Language (CoRR 2019) [paper][code]
Fusion of Detected Objects in Text for Visual Question Answering (EMNLP 2019) [paper][[code]](https://github.com/google-research/ language/tree/master/language/question_answering/b2t2) - B2T2
Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training (AAAI 2020) [paper]
LXMERT: Learning Cross-Modality Encoder Representations from Transformers (EMNLP 2019) [paper][code]
VL-BERT: Pre-training of Generic Visual-Linguistic Representatio (CoRR 2019) [paper][code]
UNITER: Learning UNiversal Image-TExt Representations (CoRR 2019) [paper]
FashionBERT: Text and Image Matching with Adaptive Loss for Cross-modal Retrieval （SIGIR 2020) [paper] - FashionBERT
VD-BERT: A Unified Vision and Dialog Transformer with BERT (CoRR 2020) [paper] - VD-BERT

模型压缩

Distilling Task-Specific Knowledge from BERT into Simple Neural Networks. Raphael Tang, Yao Lu, Linqing Liu, Lili Mou, Olga Vechtomova, Jimmy Lin. (CoRR 2019) [paper]
Model Compression with Multi-Task Knowledge Distillation for Web-scale Question Answering System. Ze Yang, Linjun Shou, Ming Gong, Wutao Lin, Daxin Jiang. (CoRR 2019) [paper] - MKDM
Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding. Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao. (CoRR 2019) [paper]
Well-Read Students Learn Better: On the Importance of Pre-training Compact Models. Iulia Turc, Ming-Wei Chang, Kenton Lee, Kristina Toutanova. (CoRR 2019) [paper]
Small and Practical BERT Models for Sequence Labeling. Henry Tsai, Jason Riesa, Melvin Johnson, Naveen Arivazhagan, Xin Li, Amelia Archer. (EMNLP 2019) [paper]
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT. Sheng Shen, Zhen Dong, Jiayu Ye, Linjian Ma, Zhewei Yao, Amir Gholami, Michael W. Mahoney, Kurt Keutzer. (AAAI 2020) [paper]
Patient Knowledge Distillation for BERT Model Compression. Siqi Sun, Yu Cheng, Zhe Gan, Jingjing Liu. (EMNLP 2019) [paper] - BERT-PKD
Extreme Language Model Compression with Optimal Subwords and Shared Projections. Sanqiang Zhao, Raghav Gupta, Yang Song, Denny Zhou. (ICLR 2019) [paper]
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf. [paper][code]
TinyBERT: Distilling BERT for Natural Language Understanding. Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu. (ICLR 2019) [paper][code]
Q8BERT: Quantized 8Bit BERT. Ofir Zafrir, Guy Boudoukh, Peter Izsak, Moshe Wasserblat. (NeurIPS 2019 Workshop) [paper]
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut. (ICLR 2020) [paper][code]
Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning. Mitchell A. Gordon, Kevin Duh, Nicholas Andrews. (ICLR 2020) [paper][PyTorch code]
Reducing Transformer Depth on Demand with Structured Dropout. Angela Fan, Edouard Grave, Armand Joulin. (ICLR 2020) [paper] - LayerDrop
Multilingual Alignment of Contextual Word Representations (ICLR 2020) [paper]
AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural Architecture Search. Daoyuan Chen, Yaliang Li, Minghui Qiu, Zhen Wang, Bofang Li, Bolin Ding, Hongbo Deng, Jun Huang, Wei Lin, Jingren Zhou. (IJCAI 2020) [paper] - AdaBERT
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing. Canwen Xu, Wangchunshu Zhou, Tao Ge, Furu Wei, Ming Zhou. (CoRR 2020) [paper][pt code][tf code][keras code]
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers. MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers. (CoRR 2020) [paper][code]
FastBERT: a Self-distilling BERT with Adaptive Inference Time. Weijie Liu, Peng Zhou, Zhiruo Wang, Zhe Zhao, Haotang Deng, Qi Ju. (ACL 2020) [paper][code]
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices. Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, Denny Zhou. (ACL 2020) [paper][code]
Towards Non-task-specific Distillation of BERT via Sentence Representation Approximation. Bowen Wu, Huan Zhang, Mengyuan Li, Zongsheng Wang, Qihang Feng, Junhong Huang, Baoxun Wang. (CoRR 2020) [paper] - BiLSTM-SRA & LTD-BERT
Poor Man's BERT: Smaller and Faster Transformer Models. Hassan Sajjad, Fahim Dalvi, Nadir Durrani, Preslav Nakov. (CoRR 2020) [paper]
DynaBERT: Dynamic BERT with Adaptive Width and Depth. Lu Hou, Lifeng Shang, Xin Jiang, Qun Liu. (CoRR 2020) [paper]
SqueezeBERT: What can computer vision teach NLP about efficient neural networks?. Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, Kurt W. Keutzer. (CoRR 2020) [paper]

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BERT系列魔改、搜索、剪枝、蒸馏方案

优化设计

预训练模型

多模态

模型压缩

模型搜索

About

Releases

Packages

BshoterJ/LaterBERTs-get-the-upper-hand

Folders and files

Latest commit

History

Repository files navigation

BERT系列魔改、搜索、剪枝、蒸馏方案

优化设计

预训练模型

多模态

模型压缩

模型搜索

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages