Skip to content
View basicv8vc's full-sized avatar

Organizations

@dmlc @slofast

Block or report basicv8vc

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

LLM pretraining

18 repositories

GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)

Python 7,678 606 Updated Jul 25, 2023

A large-scale 7B pretraining language model developed by BaiChuan-Inc.

Python 5,680 508 Updated Jul 18, 2024

A series of large language models developed by Baichuan Intelligent Technology

Python 4,118 298 Updated Nov 8, 2024

A series of large language models trained from scratch by developers @01-ai

Jupyter Notebook 7,786 491 Updated Nov 27, 2024

Modeling, training, eval, and inference code for OLMo

Python 5,059 528 Updated Jan 27, 2025

LLM training in simple, raw C/CUDA

Cuda 25,146 2,876 Updated Oct 2, 2024

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

4,114 223 Updated Sep 25, 2024

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Python 1,127 69 Updated Jan 16, 2024

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Python 1,126 102 Updated Apr 15, 2024

DeepSeek-VL: Towards Real-World Vision-Language Understanding

Python 2,454 249 Updated Apr 24, 2024

seqax = sequence modeling + JAX

Python 136 10 Updated Jul 17, 2024

Reaching LLaMA2 Performance with 0.1M Dollars

Python 967 80 Updated Jul 23, 2024

A family of open-sourced Mixture-of-Experts (MoE) Large Language Models

Python 1,427 75 Updated Mar 8, 2024

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

Python 8,126 491 Updated May 3, 2024
Python 412 15 Updated Nov 2, 2023

Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.

Shell 12,026 716 Updated Jan 23, 2025

MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.

Jupyter Notebook 7,144 469 Updated Nov 6, 2024

MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

Python 17,821 1,281 Updated Jan 26, 2025