[Feature] a new model adapter to speed up many models inference performance on Intel CPU #2554

a3213105 · 2023-10-13T02:11:01Z

Hi guys，
We have developed an opensource solution to speedup LLM inference on CPUs （especially on XEON）.
This is our repo https://github.com/intel/xFasterTransformer .
Now we have supported many models, and more will be supported soon.
We see that FastChat is widely used, so we wanted to enable our solution in FastChat.
We will submit a new adapter to leverage our solution and so that end users can get better inference performance when their use CPUs.
Can I submit a pull request directly? Or do we need to do some process?

merrymercy · 2023-10-13T13:00:08Z

Sure. You can submit a PR directly.
Some examples are ExLlama integration (#2455) and vLLM worker (https://github.com/lm-sys/FastChat/blob/main/fastchat/serve/vllm_worker.py)

a3213105 · 2023-10-19T04:01:17Z

Thanks， I will prepare the PR :)

Sure. You can submit a PR directly. Some examples are ExLlama integration (#2455) and vLLM worker (https://github.com/lm-sys/FastChat/blob/main/fastchat/serve/vllm_worker.py)

a3213105 · 2023-10-31T03:20:14Z

Sure. You can submit a PR directly. Some examples are ExLlama integration (#2455) and vLLM worker (https://github.com/lm-sys/FastChat/blob/main/fastchat/serve/vllm_worker.py)

hi, I have committed a PR #2615 for this feature, thanks for your review.

merrymercy added the enhancement New feature or request label Oct 13, 2023

a3213105 mentioned this issue Oct 31, 2023

xFastTransformer framework support #2615

Merged

a3213105 closed this as completed Nov 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] a new model adapter to speed up many models inference performance on Intel CPU #2554

[Feature] a new model adapter to speed up many models inference performance on Intel CPU #2554

a3213105 commented Oct 13, 2023

merrymercy commented Oct 13, 2023

a3213105 commented Oct 19, 2023

a3213105 commented Oct 31, 2023

[Feature] a new model adapter to speed up many models inference performance on Intel CPU #2554

[Feature] a new model adapter to speed up many models inference performance on Intel CPU #2554

Comments

a3213105 commented Oct 13, 2023

merrymercy commented Oct 13, 2023

a3213105 commented Oct 19, 2023

a3213105 commented Oct 31, 2023