Name		Name	Last commit message	Last commit date
parent directory ..
readme.md		readme.md

readme.md

Project for Efficient LLM

Tools

vllm: A high-throughput and memory-efficient inference and serving engine for LLMs. [link][paper]
bitsandbytes: 8-bit CUDA functions for PyTorch. [link]
GPTQ-for-LLaMa: 4 bits quantization of LLaMA using GPTQ. [link]
TinyChatEngine: TinyChatEngine: On-Device LLM Inference Library. [link]
LMOps: General technology for enabling AI capabilities w/ LLMs and MLLMs. [link]
lit-gpt: Hackable implementation of state-of-the-art open-source LLMs based on nanoGPT. Supports flash attention, 4-bit and 8-bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.. [link]
fastllm: 纯c++的全平台llm加速库，支持python调用，chatglm-6B级模型单卡可达10000+token / s，支持glm, llama, moss基座，手机端流畅运行. [link]
llmtools: 4-Bit Finetuning of Large Language Models on One Consumer GPU. [link]
torchdistill: A coding-free framework built on PyTorch for reproducible deep learning studies. 🏆20 knowledge distillation methods presented at CVPR, ICLR, ECCV, NeurIPS, ICCV, etc are implemented so far. 🎁 Trained models, training logs and configurations are available for ensuring the reproducibiliy and benchmark.. [link][paper]
gpt4all: open-source LLM chatbots that you can run anywhere. [link][paper]
low_bit_llama: Advanced Ultra-Low Bitrate Compression Techniques for the LLaMA Family of LLMs. [link]
exllama: A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.. [link]