Skip to content

Latest commit

 

History

History
79 lines (75 loc) · 7.73 KB

arithmetic.md

File metadata and controls

79 lines (75 loc) · 7.73 KB

排行榜 lin

评测集说明:

  • eval_2int:2个整数的加减,举例“918 + 474 =”
  • eval_3int:3个整数的加减,举例“166 + 215 + 53 =”
  • eval_4int:4个整数的加减,举例“945 + 820 + 810 + 159 = ”
  • eval_5int:5个整数的加减,举例“901 + 306 + 69 + 830 + 816 = ”
  • eval_2float:2个浮点数的加减乘除,举例"34.1 + 10.3 ="
  • eval_3float:3个浮点数的加减乘除,举例"0.97 + 0.4 / 4.51 ="
大模型 总分 eval_2int eval_3int eval_4int eval_5int eval_2float eval_3float
gpt-4o 96 100 99 99 92 98 86
gpt-3.5-turbo 80 100 98 88 67 86 41
yi-large 88 98 93 93 91 85 70
abab6.5-chat 90 96 98 97 94 86 71
qwen-max 80 99 85 73 66 91 65
qwen2-72b-instruct 94 100 99 99 97 92 78
DeepSeek-V2 96.7 100 100 97 94 98 91
glm-4 78 99 78 73 82 76 60
moonshot-v1-8k 79.3 56 94 92 90 72 72
ERNIE-4.0(计算器) 100 100 100 100 100 100 100
yi-spark 83.3 98 86 85 79 87 65
GLM-4-Flash 75.5 97 86 74 75 70 51
qwen-long 83.3 98 89 81 84 86 62
ERNIE-4.0-Turbo-8K 97.7 100 100 100 99 93 94
Doubao-pro-32k 98.2 100 100 100 100 99 90
Doubao-lite-32k 87.2 99 82 96 83 99 64
internlm2-chat-1_8b 39.7 83 37 27 24 52 15
internlm2_5-7b-chat 59.8 100 62 33 30 85 49
gemma-2-9b-it 89.3 100 94 95 92 85 70
DeepSeek-V2-Lite-Chat 61.2 99 76 33 19 80 60
ERNIE-Speed-8K 68.7 100 85 68 48 79 32
xunfei-4.0Ultra 94.3 100 100 100 96 91 79
SenseChat-Turbo 78.5 99 90 82 71 71 58
SenseChat-v4 72.2 97 76 70 65 79 46
Baichuan3-Turbo 89.2 97 93 98 89 90 68
GLM-4-Air 74.5 93 65 76 82 76 55
GLM-4-AirX 74.2 93 62 76 81 76 57
qwen-plus 93 100 100 95 97 90 76
yi-medium 89.2 100 94 92 88 90 71
yi-large-turbo 87.8 99 94 88 84 90 72
abab6.5s-chat 91.7 100 92 97 95 90 76
abab5.5s-chat 57 93 81 49 16 73 30
abab5.5-chat 39.7 97 36 12 4 69 20
qwen-turbo 81.3 97 81 90 79 83 58
gpt-4-turbo 96.5 100 100 100 100 95 84
ERNIE-3.5-8K(计算器) 100 100 100 100 100 100 100
xunfei-v3-pro 94 100 99 98 96 91 80
xunfei-v3.5-max 93.5 100 99 99 95 89 79
gpt-4 86.5 100 99 99 86 89 46
qwen2-1.5b-instruct 55.7 98 76 44 23 63 30
qwen2-0.5b-instruct 35.5 76 37 21 5 49 25
qwen2-57b-a14b-instruct 89.2 100 95 96 84 89 71
qwen2-7b-instruct 81.3 97 83 89 79 83 57
llama3-70b-instruct 90.8 99 99 99 100 80 68
llama3-8b-instruct 89.5 100 99 99 99 80 60
gpt-4o-mini 92.7 100 99 99 98 87 73
glm-4-9b-chat 76.5 97 85 79 76 70 52
internlm2-chat-7b 42.8 99 25 10 16 79 28
internlm2-chat-20b 63.3 100 37 49 70 81 43
Phi-3-mini-128k-instruct 71.3 88 72 74 68 78 48
Baichuan2-7B-Chat 34.8 93 23 11 6 60 16
Baichuan2-13B-Chat 54.8 97 42 47 32 75 36
Yi-1.5-9B-Chat 79 100 87 72 70 86 59
Yi-1.5-34B-Chat 73.8 100 90 53 49 89 62
MiniCPM-2B-dpo-bf16 52.7 73 59 69 54 35 26
gemma-7b-it 38.5 96 28 8 3 69 27
gemma-2b-it 26.3 76 7 5 1 56 13
qwen1.5-0.5b-chat 17.2 70 0 2 0 29 2
qwen1.5-1.8b-chat 26.7 84 2 2 3 56 13
qwen1.5-4b-chat 53 93 53 40 29 67 36
qwen1.5-7b-chat 71.2 99 68 76 63 73 48
qwen1.5-14b-chat 77.5 98 82 81 67 82 55
qwen1.5-32b-chat 86.8 100 99 89 79 86 68
qwen1.5-72b-chat 84.8 99 89 80 91 88 62