You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi guys,
We have developed an opensource solution to speedup LLM inference on CPUs (especially on XEON).
This is our repo https://github.com/intel/xFasterTransformer .
Now we have supported many models, and more will be supported soon.
We see that FastChat is widely used, so we wanted to enable our solution in FastChat.
We will submit a new adapter to leverage our solution and so that end users can get better inference performance when their use CPUs.
Can I submit a pull request directly? Or do we need to do some process?
The text was updated successfully, but these errors were encountered:
Hi guys,
We have developed an opensource solution to speedup LLM inference on CPUs (especially on XEON).
This is our repo https://github.com/intel/xFasterTransformer .
Now we have supported many models, and more will be supported soon.
We see that FastChat is widely used, so we wanted to enable our solution in FastChat.
We will submit a new adapter to leverage our solution and so that end users can get better inference performance when their use CPUs.
Can I submit a pull request directly? Or do we need to do some process?
The text was updated successfully, but these errors were encountered: