Skip to content

Latest commit

 

History

History
84 lines (79 loc) · 13.9 KB

README.md

File metadata and controls

84 lines (79 loc) · 13.9 KB

BERT系列魔改、搜索、剪枝、蒸馏方案

优化设计

预训练模型

  • Deep Contextualized Word Representations (NAACL 2018) [paper] - ELMo
  • Universal Language Model Fine-tuning for Text Classification (ACL 2018) [paper] - ULMFit
  • BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (NAACL 2019) [paper][code][official PyTorch code] - BERT
  • Improving Language Understanding by Generative Pre-Training (CoRR 2018) [paper] - GPT
  • Language Models are Unsupervised Multitask Learners (CoRR 2019) [paper][code] - GPT-2
  • MASS: Masked Sequence to Sequence Pre-training for Language Generation (ICML 2019) [paper][code] - MASS
  • Unified Language Model Pre-training for Natural Language Understanding and Generation (CoRR 2019) [paper][code] - UNILM
  • Multi-Task Deep Neural Networks for Natural Language Understanding (ACL 2019) [paper][code] - MT-DNN
  • ERNIE: Enhanced Language Representation with Informative Entities (ACL 2019) [paper][code] - ERNIE (THU)
  • ERNIE: Enhanced Representation through Knowledge Integration (CoRR 2019) [paper] - ERNIE (Baidu)
  • ERNIE 2.0: A Continual Pre-training Framework for Language Understanding (CoRR 2019) [paper] - ERNIE 2.0 (Baidu)
  • Pre-Training with Whole Word Masking for Chinese BERT (CoRR 2019) [paper] - Chinese-BERT-wwm
  • SpanBERT: Improving Pre-training by Representing and Predicting Spans (CoRR 2019) [paper] - SpanBERT
  • XLNet: Generalized Autoregressive Pretraining for Language Understanding (CoRR 2019) [paper][code] - XLNet
  • RoBERTa: A Robustly Optimized BERT Pretraining Approach (CoRR 2019) [paper] - RoBERTa
  • NEZHA: Neural Contextualized Representation for Chinese Language Understanding (CoRR 2019) [paper][code] - NEZHA
  • K-BERT: Enabling Language Representation with Knowledge Graph (AAAI 2020) [paper][code] - K-BERT
  • Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transforme (CoRR 2019) [paper][code] - T5
  • ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations (CoRR 2019) [paper][code] - ZEN
  • The JDDC Corpus: A Large-Scale Multi-Turn Chinese Dialogue Dataset for E-commerce Customer Service (CoRR 2019) [paper][code] - BAAI-JDAI-BERT
  • Knowledge Enhanced Contextual Word Representations (EMNLP 2019) [paper] - KnowBert
  • UER: An Open-Source Toolkit for Pre-training Models (EMNLP 2019) [paper][code] - UER
  • ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators (ICLR 2020) [paper] - ELECTRA
  • StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding (ICLR 2020) [paper] - StructBERT
  • FreeLB: Enhanced Adversarial Training for Language Understanding (ICLR 2020) [paper][code] - FreeLB
  • HUBERT Untangles BERT to Improve Transfer across NLP Tasks (CoRR 2019) [paper] - HUBERT
  • CodeBERT: A Pre-Trained Model for Programming and Natural Languages (CoRR 2020) [paper] - CodeBERT
  • ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training (CoRR 2020) [paper] - ProphetNet
  • ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation (CoRR 2020) [paper][code] - ERNIE-GEN
  • Efficient Training of BERT by Progressively Stacking (ICML 2019) [paper][code] - StackingBERT
  • PoWER-BERT: Accelerating BERT Inference via Progressive Word-vector Elimination (CoRR 2020) [paper][code]
  • UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training (CoRR 2020) [paper][code] - UNILMv2
  • MPNet: Masked and Permuted Pre-training for Language Understanding (CoRR 2020) [paper][code] - MPNet
  • Language Models are Few-Shot Learners (CoRR 2020) [paper][code] - GPT-3
  • SPECTER: Document-level Representation Learning using Citation-informed Transformers (ACL 2020) [paper] - SPECTER
  • PLATO-2: Towards Building an Open-Domain Chatbot via Curriculum Learning (CoRR 2020) [paper][code] - PLATO-2
  • DeBERTa: Decoding-enhanced BERT with Disentangled Attention (CoRR 2020) [paper][code] - DeBERTa

多模态

  • VideoBERT: A Joint Model for Video and Language Representation Learning (ICCV 2019) [paper]
  • Learning Video Representations using Contrastive Bidirectional Transformer (CoRR 2019) [paper] - CBT
  • ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks (NeurIPS 2019) [paper][code]
  • VisualBERT: A Simple and Performant Baseline for Vision and Language (CoRR 2019) [paper][code]
  • Fusion of Detected Objects in Text for Visual Question Answering (EMNLP 2019) [paper][[code]](https://github.com/google-research/ language/tree/master/language/question_answering/b2t2) - B2T2
  • Unicoder-VL: A Universal Encoder for Vision and Language by Cross-modal Pre-training (AAAI 2020) [paper]
  • LXMERT: Learning Cross-Modality Encoder Representations from Transformers (EMNLP 2019) [paper][code]
  • VL-BERT: Pre-training of Generic Visual-Linguistic Representatio (CoRR 2019) [paper][code]
  • UNITER: Learning UNiversal Image-TExt Representations (CoRR 2019) [paper]
  • FashionBERT: Text and Image Matching with Adaptive Loss for Cross-modal Retrieval (SIGIR 2020) [paper] - FashionBERT
  • VD-BERT: A Unified Vision and Dialog Transformer with BERT (CoRR 2020) [paper] - VD-BERT

模型压缩

  • Distilling Task-Specific Knowledge from BERT into Simple Neural Networks. Raphael Tang, Yao Lu, Linqing Liu, Lili Mou, Olga Vechtomova, Jimmy Lin. (CoRR 2019) [paper]
  • Model Compression with Multi-Task Knowledge Distillation for Web-scale Question Answering System. Ze Yang, Linjun Shou, Ming Gong, Wutao Lin, Daxin Jiang. (CoRR 2019) [paper] - MKDM
  • Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding. Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao. (CoRR 2019) [paper]
  • Well-Read Students Learn Better: On the Importance of Pre-training Compact Models. Iulia Turc, Ming-Wei Chang, Kenton Lee, Kristina Toutanova. (CoRR 2019) [paper]
  • Small and Practical BERT Models for Sequence Labeling. Henry Tsai, Jason Riesa, Melvin Johnson, Naveen Arivazhagan, Xin Li, Amelia Archer. (EMNLP 2019) [paper]
  • Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT. Sheng Shen, Zhen Dong, Jiayu Ye, Linjian Ma, Zhewei Yao, Amir Gholami, Michael W. Mahoney, Kurt Keutzer. (AAAI 2020) [paper]
  • Patient Knowledge Distillation for BERT Model Compression. Siqi Sun, Yu Cheng, Zhe Gan, Jingjing Liu. (EMNLP 2019) [paper] - BERT-PKD
  • Extreme Language Model Compression with Optimal Subwords and Shared Projections. Sanqiang Zhao, Raghav Gupta, Yang Song, Denny Zhou. (ICLR 2019) [paper]
  • DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. Victor Sanh, Lysandre Debut, Julien Chaumond, Thomas Wolf. [paper][code]
  • TinyBERT: Distilling BERT for Natural Language Understanding. Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, Qun Liu. (ICLR 2019) [paper][code]
  • Q8BERT: Quantized 8Bit BERT. Ofir Zafrir, Guy Boudoukh, Peter Izsak, Moshe Wasserblat. (NeurIPS 2019 Workshop) [paper]
  • ALBERT: A Lite BERT for Self-supervised Learning of Language Representations. Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, Radu Soricut. (ICLR 2020) [paper][code]
  • Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning. Mitchell A. Gordon, Kevin Duh, Nicholas Andrews. (ICLR 2020) [paper][PyTorch code]
  • Reducing Transformer Depth on Demand with Structured Dropout. Angela Fan, Edouard Grave, Armand Joulin. (ICLR 2020) [paper] - LayerDrop
  • Multilingual Alignment of Contextual Word Representations (ICLR 2020) [paper]
  • AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural Architecture Search. Daoyuan Chen, Yaliang Li, Minghui Qiu, Zhen Wang, Bofang Li, Bolin Ding, Hongbo Deng, Jun Huang, Wei Lin, Jingren Zhou. (IJCAI 2020) [paper] - AdaBERT
  • BERT-of-Theseus: Compressing BERT by Progressive Module Replacing. Canwen Xu, Wangchunshu Zhou, Tao Ge, Furu Wei, Ming Zhou. (CoRR 2020) [paper][pt code][tf code][keras code]
  • MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers. MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers. (CoRR 2020) [paper][code]
  • FastBERT: a Self-distilling BERT with Adaptive Inference Time. Weijie Liu, Peng Zhou, Zhiruo Wang, Zhe Zhao, Haotang Deng, Qi Ju. (ACL 2020) [paper][code]
  • MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices. Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, Denny Zhou. (ACL 2020) [paper][code]
  • Towards Non-task-specific Distillation of BERT via Sentence Representation Approximation. Bowen Wu, Huan Zhang, Mengyuan Li, Zongsheng Wang, Qihang Feng, Junhong Huang, Baoxun Wang. (CoRR 2020) [paper] - BiLSTM-SRA & LTD-BERT
  • Poor Man's BERT: Smaller and Faster Transformer Models. Hassan Sajjad, Fahim Dalvi, Nadir Durrani, Preslav Nakov. (CoRR 2020) [paper]
  • DynaBERT: Dynamic BERT with Adaptive Width and Depth. Lu Hou, Lifeng Shang, Xin Jiang, Qun Liu. (CoRR 2020) [paper]
  • SqueezeBERT: What can computer vision teach NLP about efficient neural networks?. Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, Kurt W. Keutzer. (CoRR 2020) [paper]

模型搜索