CodeGeeX FasterTransformer

This repository provides the fastertrasformer implementation of CodeGeeX model.

Get Started

First, download and setup the following docker environment, replace <WORK_DIR> by the directory of this repo:

docker pull nvcr.io/nvidia/pytorch:21.11-py3
docker run -p 9114:5000 --cpus 12 --gpus '"device=0"' -it -v <WORK_DIR>:/workspace/codegeex-fastertransformer --ipc=host  --name=test nvcr.io/nvidia/pytorch:21.11-py3

Second, install following packages in the docker:

pip3 install transformers
pip3 install sentencepiece
cd codegeex-fastertransformer
sh make_all.sh  # Remember to specify the DSM version according to the GPU.

Then, convert the initial checkpoint (download here) to FT version using get_ckpt_ft.py.

Finally, run api.py to start the server and run post.py to send request:

nohup python3 api.py > test.log 2>&1 &
python3 post.py

Inference performance

The following figure compares the performances of pure Pytorch, Megatron and FasterTransformer under INT8 and FP16. The fastest implementation is INT8 + FastTrans, and the average time of generating a token <15ms.

Liscense

Our code is licensed under the Apache-2.0 license.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

CodeGeeX FasterTransformer

Get Started

Inference performance

Liscense

Files

README.md

Latest commit

History

README.md

File metadata and controls

CodeGeeX FasterTransformer

Get Started

Inference performance

Liscense