类别 | 大模型 | 分类能力 | 信息抽取 | 阅读理解 | 数据分析 | 指令遵从 | 算术运算 | 综合能力 |
---|---|---|---|---|---|---|---|---|
开源 | deepseek-chat-v3(new) | 93 | 97.0 | 94.7 | 100.0 | 84 | 99.0 | 94.6 |
商用 | gpt-4o | 93 | 96.3 | 98.0 | 100.0 | 83 | 95.7 | 94.3 |
商用 | yi-lightning | 94 | 90.4 | 95.3 | 100.0 | 82 | 96.0 | 93.0 |
商用 | 百度ERNIE-4.0-Turbo | 90 | 94.8 | 96.0 | 98.7 | 78 | 97.7 | 92.5 |
商用 | abab7-chat-preview(new) | 89 | 96.3 | 94.7 | 97.3 | 83 | 94.2 | 92.4 |
商用 | 百度ERNIE-3.5-8K | 94 | 89.6 | 98.0 | 100.0 | 72 | 100.0 | 92.3 |
开源 | qwen2.5-32b-instruct | 91 | 94.1 | 96.0 | 91.3 | 83 | 94.0 | 91.6 |
商用 | gpt-4o-mini | 90 | 93.3 | 89.3 | 100.0 | 83 | 92.7 | 91.4 |
商用 | hunyuan-turbo(new) | 93 | 85.2 | 93.3 | 97.3 | 78 | 99.5 | 91.0 |
商用 | 百度ERNIE-4.0 | 88 | 89.0 | 94.7 | 94.0 | 79 | 100.0 | 90.8 |
开源 | qwen2.5-14b-instruct | 89 | 90.4 | 94.0 | 98.0 | 81 | 91.5 | 90.7 |
开源 | deepseek-chat-v2 | 93 | 88.0 | 94.0 | 96.0 | 76 | 96.7 | 90.6 |
商用 | GLM-4-Plus | 87 | 91.9 | 95.3 | 99.3 | 81 | 88.7 | 90.5 |
开源 | Qwen2-72B-Instruct | 87 | 91.1 | 94.7 | 90.0 | 86 | 94.2 | 90.5 |
开源 | qwen2.5-72b-instruct | 92 | 87.4 | 92.0 | 92.7 | 83 | 95.5 | 90.4 |
商用 | hunyuan-large(new) | 91 | 88.9 | 92.7 | 96.7 | 79 | 93.0 | 90.2 |
商用 | 豆包Doubao-pro-32k | 86 | 88.1 | 96.7 | 86.7 | 85 | 98.2 | 90.1 |
商用 | gemini-1.5-pro | 87 | 90.4 | 93.3 | 99.3 | 75 | 92.2 | 89.5 |
商用 | gemini-1.5-flash | 91 | 87.4 | 92.7 | 97.3 | 77 | 91.8 | 89.5 |
商用 | SenseChat-5(new) | 93 | 90.4 | 89.3 | 97.3 | 82 | 85.0 | 89.5 |
商用 | 讯飞4.0Ultra | 88 | 84.4 | 96.0 | 92.7 | 80 | 94.3 | 89.2 |
开源 | Llama-3.1-70B-Instruct | 87 | 88.9 | 92.0 | 90.7 | 79 | 94.8 | 88.7 |
商用 | 阿里qwen-max | 92 | 88.9 | 94.7 | 99.3 | 77 | 79.8 | 88.6 |
商用 | minimax-abab6.5-chat | 89 | 87.0 | 89.3 | 95.3 | 76 | 90.3 | 87.8 |
开源 | Llama-3-70B-Instruct | 88 | 87.0 | 96.0 | 95.0 | 70 | 90.8 | 87.8 |
商用 | GLM-4-Long | 85 | 93.3 | 89.3 | 96.7 | 80 | 81.2 | 87.6 |
开源 | qwen2.5-7b-instruct | 85 | 88.1 | 93.3 | 91.3 | 77 | 89.8 | 87.4 |
开源 | internlm2_5-20b-chat | 86 | 90.4 | 86.0 | 97.3 | 75 | 89.7 | 87.4 |
商用 | Baichuan3-Turbo | 88 | 86.7 | 94.7 | 90.7 | 75 | 89.2 | 87.4 |
商用 | minimax-abab6.5s-chat | 87 | 88.0 | 88.7 | 88.0 | 80 | 91.7 | 87.2 |
商用 | 讯飞星火v3.5(spark-max) | 87 | 92.0 | 89.3 | 87.3 | 74 | 93.5 | 87.2 |
商用 | Baichuan4 | 86 | 94.1 | 93.3 | 95.3 | 75 | 78.2 | 87.0 |
商用 | 智谱GLM4 | 92 | 86.7 | 90.0 | 98.0 | 77 | 78.0 | 87.0 |
商用 | 智谱GLM-4-Air | 89 | 91.9 | 92.7 | 88.0 | 83 | 74.5 | 86.5 |
商用 | 智谱GLM-4-AirX | 89 | 91.9 | 92.7 | 88.0 | 83 | 74.2 | 86.5 |
商用 | 阿里qwen-plus | 88 | 89.6 | 90.0 | 84.0 | 73 | 93.0 | 86.3 |
开源 | qwen2-57b-a14b-instruct | 85 | 88.1 | 89.3 | 87.3 | 77 | 89.2 | 86.0 |
开源 | gemma-2-9b-it | 85 | 82.2 | 88.7 | 87.3 | 81 | 89.3 | 85.6 |
商用 | hunyuan-standard(new) | 87 | 89.6 | 93.3 | 85.3 | 74 | 83.0 | 85.4 |
商用 | 讯飞星火v3(spark-pro) | 87 | 82.0 | 88.0 | 86.0 | 74 | 94.0 | 85.2 |
商用 | 阿里qwen-long | 89 | 85.9 | 90.0 | 86.7 | 75 | 83.3 | 85.0 |
商用 | 月之暗面moonshot-v1-8k | 92 | 85.0 | 84.0 | 89.3 | 72 | 79.3 | 83.6 |
开源 | glm-4-9b-chat | 90 | 82.2 | 90.0 | 82.0 | 79 | 76.5 | 83.3 |
开源 | Qwen2-7B-Instruct | 89 | 83.7 | 86.7 | 75.3 | 77 | 81.3 | 82.2 |
商用 | gemini-1.0-pro | 84 | 89.6 | 92.7 | 99.3 | 76 | 50.8 | 82.1 |
开源 | Yi-1.5-34B-Chat | 90 | 83.0 | 82.7 | 83.3 | 74 | 79.0 | 82.0 |
商用 | 智谱GLM-4-Flash | 89 | 80.0 | 86.0 | 82.0 | 79 | 75.5 | 81.9 |
商用 | 百度ERNIE-Speed-8K | 88 | 88.1 | 88.0 | 89.3 | 68 | 68.7 | 81.7 |
商用 | 商汤SenseChat-v4 | 89 | 78.5 | 88.0 | 86.7 | 71 | 72.2 | 80.9 |
开源 | Llama-3-8B-Instruct | 86 | 74.0 | 80.0 | 90.0 | 63 | 89.5 | 80.4 |
开源 | internlm2_5-7b-chat | 86 | 84.4 | 90.0 | 83.3 | 79 | 59.8 | 80.4 |
开源 | qwen2.5-3b-instruct | 81 | 75.6 | 78.7 | 83.3 | 77 | 85.7 | 80.2 |
商用 | 阿里qwen-turbo | 83 | 85.2 | 88.0 | 76.0 | 66 | 81.3 | 79.9 |
开源 | Yi-1.5-9B-Chat | 82 | 83.0 | 84.7 | 80.0 | 72 | 73.8 | 79.2 |
开源 | Llama-3.1-8B-Instruct | 63 | 85.2 | 82.0 | 84.0 | 69 | 90.5 | 79.0 |
商用 | 商汤SenseChat-Turbo | 81 | 77.8 | 76.7 | 86.0 | 72 | 78.5 | 78.7 |
商用 | 豆包Doubao-lite-32k | 77 | 86.7 | 88.7 | 64.7 | 62 | 87.2 | 77.7 |
开源 | DeepSeek-V2-Lite-Chat | 81 | 76.3 | 81.3 | 73.3 | 69 | 61.2 | 73.7 |
商用 | minimax-abab5.5-chat | 83 | 79.0 | 86.7 | 72.7 | 76 | 39.7 | 72.8 |
开源 | qwen2.5-1.5b-instruct | 70 | 71.9 | 72.7 | 63.3 | 62 | 83.3 | 70.5 |
开源 | MiniCPM-2B-dpo | 79 | 77.0 | 74.0 | 66.0 | 55 | 52.7 | 67.3 |
开源 | qwen2-1.5b-instruct | 73 | 74.1 | 68.0 | 50.7 | 54 | 55.7 | 62.6 |
商用 | minimax-abab5.5s-chat | 58 | 57.0 | 70.7 | 56.0 | 49 | 57.0 | 58.0 |
开源 | qwen2.5-0.5b-instruct | 52 | 53.3 | 63.3 | 46.0 | 58 | 51.8 | 54.1 |
开源 | internlm2-chat-1_8b | 69 | 60.7 | 63.3 | 46.0 | 45 | 39.7 | 54.0 |
开源 | qwen2-0.5b-instruct | 49 | 53.3 | 62.0 | 36.7 | 48 | 35.5 | 47.4 |