LLM pretraining
GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)
A large-scale 7B pretraining language model developed by BaiChuan-Inc.
A series of large language models developed by Baichuan Intelligent Technology
A series of large language models trained from scratch by developers @01-ai
Modeling, training, eval, and inference code for OLMo
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Reaching LLaMA2 Performance with 0.1M Dollars
A family of open-sourced Mixture-of-Experts (MoE) Large Language Models
The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.
Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.
MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone