basicv8vc

Jia basicv8vc

I like to fine-tune Deep Neural Nets on small datasets.

93 followers · 123 following

Achievements

x2 x2

Achievements

x2 x2

Organizations

Stars

LLM pretraining

18 repositories

THUDM / GLM-130B

GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)

Python 7,678 606 Updated Jul 25, 2023

baichuan-inc / Baichuan-7B

A large-scale 7B pretraining language model developed by BaiChuan-Inc.

Python 5,680 508 Updated Jul 18, 2024

baichuan-inc / Baichuan2

A series of large language models developed by Baichuan Intelligent Technology

Python 4,118 298 Updated Nov 8, 2024

01-ai / Yi

A series of large language models trained from scratch by developers @01-ai

Jupyter Notebook 7,786 491 Updated Nov 27, 2024

allenai / OLMo

Modeling, training, eval, and inference code for OLMo

Python 5,059 528 Updated Jan 27, 2025

karpathy / llm.c

LLM training in simple, raw C/CUDA

Cuda 25,146 2,876 Updated Oct 2, 2024

deepseek-ai / DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

4,114 223 Updated Sep 25, 2024

deepseek-ai / DeepSeek-MoE

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Python 1,127 69 Updated Jan 16, 2024

deepseek-ai / DeepSeek-Math

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Python 1,126 102 Updated Apr 15, 2024

deepseek-ai / DeepSeek-VL

DeepSeek-VL: Towards Real-World Vision-Language Understanding

Python 2,454 249 Updated Apr 24, 2024

MatX-inc / seqax

seqax = sequence modeling + JAX

Python 136 10 Updated Jul 17, 2024

myshell-ai / JetMoE

Reaching LLaMA2 Performance with 0.1M Dollars

Python 967 80 Updated Jul 23, 2024

XueFuzhao / OpenMoE

A family of open-sourced Mixture-of-Experts (MoE) Large Language Models

Python 1,427 75 Updated Mar 8, 2024

jzhang38 / TinyLlama

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

Python 8,126 491 Updated May 3, 2024

SkunkworksAI / hydra-moe

Python 412 15 Updated Nov 2, 2023

QwenLM / Qwen2.5

Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.

Shell 12,026 716 Updated Jan 23, 2025

OpenBMB / MiniCPM

MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.

Jupyter Notebook 7,144 469 Updated Nov 6, 2024

OpenBMB / MiniCPM-o

MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone

Python 17,821 1,281 Updated Jan 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Jia basicv8vc

Achievements

Achievements

Organizations

Block or report basicv8vc

LLM pretraining

THUDM / GLM-130B

baichuan-inc / Baichuan-7B

baichuan-inc / Baichuan2

01-ai / Yi

allenai / OLMo

karpathy / llm.c

deepseek-ai / DeepSeek-V2

deepseek-ai / DeepSeek-MoE

deepseek-ai / DeepSeek-Math

deepseek-ai / DeepSeek-VL

MatX-inc / seqax

myshell-ai / JetMoE

XueFuzhao / OpenMoE

jzhang38 / TinyLlama

SkunkworksAI / hydra-moe

QwenLM / Qwen2.5

OpenBMB / MiniCPM

OpenBMB / MiniCPM-o