v0.1.0

XkunW released this 24 Apr 20:21

· 163 commits to main since this release

0784588

Easy-to-use high-throughput LLM inference on Slurm clusters using vLLM

Supported models and variants:

Command R plus
DBRX: Instruct
Llama 2: 7b, 7b-chat, 13b, 13b-chat, 70b, 70b-chat
Llama 3: 8B, 8B-Instruct, 70B, 70B-Instruct
Mixtral: 8x7B-Instruct-v0.1, 8x22B-v0.1, 8x22B-Instruct-v0.1

Supported functionalities:

Completions and chat completions
Logits generation

Assets 2