threeColorFr

threeColorFr

18 followers · 2 following

Achievements

Organizations

Stars

LLM-eval

7 repositories

CLUEbenchmark / SuperCLUE

SuperCLUE: 中文通用大模型综合性基准 | A Benchmark for Foundation Models in Chinese

3,046 98 Updated May 23, 2024

explodinggradients / ragas

Supercharge Your LLM Application Evaluations 🚀

Python 7,691 782 Updated Dec 26, 2024

hkust-nlp / ceval

Official github repo for C-Eval, a Chinese evaluation suite for foundation models [NeurIPS 2023]

Python 1,655 79 Updated Oct 26, 2023

tjunlp-lab / Awesome-LLMs-Evaluation-Papers

The papers are organized according to our survey: Evaluating Large Language Models: A Comprehensive Survey.

724 47 Updated May 8, 2024

microsoft / promptbench

A unified evaluation framework for large language models

Python 2,493 185 Updated Oct 28, 2024

MLGroupJLU / LLM-eval-survey

The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models".

1,454 92 Updated Jun 3, 2024

jeinlee1991 / chinese-llm-benchmark

中文大模型能力评测榜单：目前已囊括128个大模型，覆盖chatgpt、gpt-4o、谷歌gemini、百度文心一言、阿里通义千问、百川、讯飞星火、商汤senseChat、minimax等商用模型，以及qwen2.5、llama3.1、glm4、书生internLM2.5、openbuddy、AquilaChat等开源大模型。不仅提供能力评分排行榜，也提供所有模型的原始输出结果！

3,108 138 Updated Dec 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

threeColorFr

Achievements

Achievements

Organizations

Block or report threeColorFr

LLM-eval

CLUEbenchmark / SuperCLUE

explodinggradients / ragas

hkust-nlp / ceval

tjunlp-lab / Awesome-LLMs-Evaluation-Papers

microsoft / promptbench

MLGroupJLU / LLM-eval-survey

jeinlee1991 / chinese-llm-benchmark