v0.1.0
Easy-to-use high-throughput LLM inference on Slurm clusters using vLLM
Supported models and variants:
- Command R plus
- DBRX: Instruct
- Llama 2: 7b, 7b-chat, 13b, 13b-chat, 70b, 70b-chat
- Llama 3: 8B, 8B-Instruct, 70B, 70B-Instruct
- Mixtral: 8x7B-Instruct-v0.1, 8x22B-v0.1, 8x22B-Instruct-v0.1
Supported functionalities:
- Completions and chat completions
- Logits generation