RevMUX: Data Multiplexing with Reversible Adapters for Efficient LLM Batch Inference

Authors: Yige Xu, Xu Guo, Zhiwei Zeng, Chunyan Miao

Overview

The expansion of Large Language Models (LLMs) has driven breakthrough in Natural Language Processing (NLP) but raised concerns about inference efficiency, particularly latency, memory usage, and throughput.

Figure 1: Mini-Batch Processing with Single-Input Single-Output (SISO)

Figure 2: Multi-Input Multi-Output (MIMO) with data multiplexing and demultiplexing

Our work addresses the need of high throughput through data multiplexing, handling batches of concurrent queries while maintaining satisfactory downstream performance.

We fixed the backbone language models and tunes the adapters only. Then we design a reversible adapter to mix the instances and perform a reverse operation to reconstruct the individual outputs.

Figure 3: Overview of Our RevMUX

Figure 4: Illustration of the reversible multiplexer and reverse demultiplexer when N=2.

Quick Start

Setup and Dependencies

Requirements:

fastNLP==0.7.0
torch==2.3.1+cu118
transformers==4.42.3

Data Preparation

The dataset should be downloaded under the same directory:

/path/to/your/data/dir
    |--/MRPC/
        |--/dev.tsv
        |--/test.tsv
        |--/train.tsv
    |--/QNLI/
        |--/dev.tsv
        |--/test.tsv
        |--/train.tsv
    |--/RTE/
        |--/dev.tsv
        |--/test.tsv
        |--/train.tsv
    |--/SST-2/
        |--/dev.tsv
        |--/test.tsv
        |--/train.tsv

Usage

T5

bash run_batch_inference_t5.sh  \
    --task_name sst-2 \
    --model_name t5-small \
    --model_type revmux \
    --batch_size 32 \
    --n_epochs 50 \
    --combine_first 3 \
    --compose_size 2 \
    --data_dir /path/to/your/data/dir \
    --adapter_lr 2e-5 \
    --save_dir /path/to/you/save/dir

BERT

bash run_batch_inference_bert.sh  \
    --task_name sst-2 \
    --model_name bert-base-uncased \
    --model_type revmux \
    --batch_size 32 \
    --n_epochs 50 \
    --combine_first 6 \
    --compose_size 2 \
    --data_dir /path/to/your/data/dir \
    --adapter_lr 2e-5 \
    --save_dir /path/to/you/save/dir

LLaMA

bash run_batch_inference_llama.sh  \
    --task_name sst-2 \
    --model_name /path/to/your/llama3 \
    --model_type revmux \
    --batch_size 2 \
    --n_epochs 10 \
    --combine_first 16 \
    --compose_size 2 \
    --data_dir /path/to/your/data/dir \
    --adapter_lr 2e-5 \
    --save_dir /path/to/you/save/dir

Arguments:

task_name is selected from [sst-2, rte, qnli, mrpc].

model_name is the name of backbone language model, selected from [t5-small, t5-base, t5-large, bert-base-uncased].

model_type: revmux is our RevMUX, ora is the baseline of Only Multiplexer Reversible, adapter is the baseline of Vanilla Adapters.

combine_first: the number of prefilling layers.

compose_size: the number of instances mixed together.

Citation

@inproceedings{xu-etal-2024-revmux,
    title = "{R}ev{MUX}: Data Multiplexing with Reversible Adapters for Efficient {LLM} Batch Inference",
    author = "Xu, Yige  and
      Guo, Xu  and
      Zeng, Zhiwei  and
      Miao, Chunyan",
    booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.emnlp-main.1232",
    doi = "10.18653/v1/2024.emnlp-main.1232",
    pages = "22072--22087",
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
images		images
plm_models		plm_models
README.md		README.md
batch_inference_bert.py		batch_inference_bert.py
batch_inference_llama.py		batch_inference_llama.py
batch_inference_t5.py		batch_inference_t5.py
evaluation.py		evaluation.py
fs_sampler.py		fs_sampler.py
nn_modules_bert.py		nn_modules_bert.py
nn_modules_llama.py		nn_modules_llama.py
nn_modules_t5.py		nn_modules_t5.py
optim.py		optim.py
run_batch_inference_bert.sh		run_batch_inference_bert.sh
run_batch_inference_llama.sh		run_batch_inference_llama.sh
run_batch_inference_t5.sh		run_batch_inference_t5.sh
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RevMUX: Data Multiplexing with Reversible Adapters for Efficient LLM Batch Inference

Overview

Quick Start

Setup and Dependencies

Data Preparation

Usage

T5

BERT

LLaMA

Citation

About

Releases

Packages

Languages

xuyige/RevMUX

Folders and files

Latest commit

History

Repository files navigation

RevMUX: Data Multiplexing with Reversible Adapters for Efficient LLM Batch Inference

Overview

Quick Start

Setup and Dependencies

Data Preparation

Usage

T5

BERT

LLaMA

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages