Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding
Sangmin Bae
* equal contribution
- Early-Exiting dynamically allocates computation paths based on the complexity of generation for each token.
- Conventional framework failed to show actual speedup due to the large number of exit points and state copying mechanism.
- We propose FREE, consists of (1) shallow-deep module, (2) synchronized parallel decoding, and (3) adaptive threshold estimator.
- In contrast to conventional approaches, FREE achieved larger inference speedup on extensive generation tasks.
- Implement CALM and FREE on decoder-only models
- (24.02.08) Release finetuned checkpoints
- (24.01.26) Won 🥈Silver award from Samsung Humantech Paper Awards
Install the necessary packages with:
$ pip install -r requirements.txt
We experimented with 4 summarization tasks, 1 question answering task, and 1 machine translation task.
Please see the scripts and run shell files to train or evaluate on each dataset.
$ bash run_[TASK_NAME]_[DATASET_NAME].sh
You can run three early-exiting methods, including Static-Exiting, CALM, and our FREE method.
Here are some important arguments to be considered.
Please refer additional_args for more details.
--ouput_hidden_states_decoder True
: return hidden_states from intermediate layers--intermediate_loss_fn shallowdeep_kd_dyna
: use a dynamic distillation loss between shallow and deep models--shallow_exit_layer [int]
: set the number of layers for the shallow model--distill_layer_alpha [float]
: distillation interpolation hyperparameter between CE and KL divergence losses
--ouput_hidden_states_decoder True
: return hidden_states from intermediate layers--intermediate_loss_fn weighted_ce
: use a weighted average loss across all layers
--deploy_scenario True
: this should be always True to use deploying_[MODEL_NAME].py for FREE or CALM--use_shallow_deep True
: use shallow-deep module--shallow_exit_layer [int]
: set the number of layers for the shallow model--shallow2deep_conf_type softmax
: set the confidence measure to softmax values--shallow2deep_conf_threshold [float]
: threshold value to decide whether to exit or not in the shallow model--use_adapt_threshold True
: use adaptive threshold estimator, where the initial threshold is set to shallow2deep_conf_threshold
--deploy_scenario True
: this should be always True to use deploying_[MODEL_NAME].py for FREE or CALM--use_early_exit True
: use conventional early-exiting framework--exit_conf_type softmax
: set the confidence measure to softmax values--exit_conf_threshold [float]
: threshold value to decide whether to exit or not--exit_min_layer [int]
: the minimum number of layers to forward to decide the exiting
--static_exit_layer [int]
: set how many layers to use for prediction
FREE demonstrated robust performance and a larger AUC across various datasets and models, specifically with T5-large and T5-3B.
We conducted two human-like evaluation methods, Likert scale scoring and pairwise comparison (refer to this paper).
After correctly making input files through ipynb file, run bash gpt_eval.sh
with your own OpenAI API_KEY.
Then, you can get the results by running the last cell in ipynb file.
We share finetuned checkpoints in google drive.
Note that you must download tokenizer.json
for each model individually from HuggingFace to run it without errors. (refer to Issue #3)
If you find this repo useful for your research, please consider citing our paper:
@inproceedings{DBLP:conf/emnlp/BaeKSY23,
author = {Sangmin Bae and
Jongwoo Ko and
Hwanjun Song and
Se{-}Young Yun},
editor = {Houda Bouamor and
Juan Pino and
Kalika Bali},
title = {Fast and Robust Early-Exiting Framework for Autoregressive Language
Models with Synchronized Parallel Decoding},
booktitle = {Proceedings of the 2023 Conference on Empirical Methods in Natural
Language Processing, {EMNLP} 2023, Singapore, December 6-10, 2023},
pages = {5910--5924},
publisher = {Association for Computational Linguistics},
year = {2023},
url = {https://doi.org/10.18653/v1/2023.emnlp-main.362},
doi = {10.18653/V1/2023.EMNLP-MAIN.362},
timestamp = {Fri, 12 Apr 2024 13:11:38 +0200},
biburl = {https://dblp.org/rec/conf/emnlp/BaeKSY23.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
- Sangmin Bae: [email protected]
- Jongwoo Ko: [email protected]