排行榜

评测集说明：

eval_2int：2个整数的加减，举例“918 + 474 =”
eval_3int：3个整数的加减，举例“166 + 215 + 53 =”
eval_4int：4个整数的加减，举例“945 + 820 + 810 + 159 = ”
eval_5int：5个整数的加减，举例“901 + 306 + 69 + 830 + 816 = ”
eval_2float：2个浮点数的加减乘除，举例"34.1 + 10.3 ="
eval_3float：3个浮点数的加减乘除，举例"0.97 + 0.4 / 4.51 ="

大模型	总分	eval_2int	eval_3int	eval_4int	eval_5int	eval_2float	eval_3float
gpt-4o	96	100	99	99	92	98	86
gpt-3.5-turbo	80	100	98	88	67	86	41
yi-large	88	98	93	93	91	85	70
abab6.5-chat	90	96	98	97	94	86	71
qwen-max	80	99	85	73	66	91	65
qwen2-72b-instruct	94	100	99	99	97	92	78
DeepSeek-V2	96.7	100	100	97	94	98	91
glm-4	78	99	78	73	82	76	60
moonshot-v1-8k	79.3	56	94	92	90	72	72
ERNIE-4.0（计算器）	100	100	100	100	100	100	100
yi-spark	83.3	98	86	85	79	87	65
GLM-4-Flash	75.5	97	86	74	75	70	51
qwen-long	83.3	98	89	81	84	86	62
ERNIE-4.0-Turbo-8K	97.7	100	100	100	99	93	94
Doubao-pro-32k	98.2	100	100	100	100	99	90
Doubao-lite-32k	87.2	99	82	96	83	99	64
internlm2-chat-1_8b	39.7	83	37	27	24	52	15
internlm2_5-7b-chat	59.8	100	62	33	30	85	49
gemma-2-9b-it	89.3	100	94	95	92	85	70
DeepSeek-V2-Lite-Chat	61.2	99	76	33	19	80	60
ERNIE-Speed-8K	68.7	100	85	68	48	79	32
xunfei-4.0Ultra	94.3	100	100	100	96	91	79
SenseChat-Turbo	78.5	99	90	82	71	71	58
SenseChat-v4	72.2	97	76	70	65	79	46
Baichuan3-Turbo	89.2	97	93	98	89	90	68
GLM-4-Air	74.5	93	65	76	82	76	55
GLM-4-AirX	74.2	93	62	76	81	76	57
qwen-plus	93	100	100	95	97	90	76
yi-medium	89.2	100	94	92	88	90	71
yi-large-turbo	87.8	99	94	88	84	90	72
abab6.5s-chat	91.7	100	92	97	95	90	76
abab5.5s-chat	57	93	81	49	16	73	30
abab5.5-chat	39.7	97	36	12	4	69	20
qwen-turbo	81.3	97	81	90	79	83	58
gpt-4-turbo	96.5	100	100	100	100	95	84
ERNIE-3.5-8K（计算器）	100	100	100	100	100	100	100
xunfei-v3-pro	94	100	99	98	96	91	80
xunfei-v3.5-max	93.5	100	99	99	95	89	79
gpt-4	86.5	100	99	99	86	89	46
qwen2-1.5b-instruct	55.7	98	76	44	23	63	30
qwen2-0.5b-instruct	35.5	76	37	21	5	49	25
qwen2-57b-a14b-instruct	89.2	100	95	96	84	89	71
qwen2-7b-instruct	81.3	97	83	89	79	83	57
llama3-70b-instruct	90.8	99	99	99	100	80	68
llama3-8b-instruct	89.5	100	99	99	99	80	60
gpt-4o-mini	92.7	100	99	99	98	87	73
glm-4-9b-chat	76.5	97	85	79	76	70	52
internlm2-chat-7b	42.8	99	25	10	16	79	28
internlm2-chat-20b	63.3	100	37	49	70	81	43
Phi-3-mini-128k-instruct	71.3	88	72	74	68	78	48
Baichuan2-7B-Chat	34.8	93	23	11	6	60	16
Baichuan2-13B-Chat	54.8	97	42	47	32	75	36
Yi-1.5-9B-Chat	79	100	87	72	70	86	59
Yi-1.5-34B-Chat	73.8	100	90	53	49	89	62
MiniCPM-2B-dpo-bf16	52.7	73	59	69	54	35	26
gemma-7b-it	38.5	96	28	8	3	69	27
gemma-2b-it	26.3	76	7	5	1	56	13
qwen1.5-0.5b-chat	17.2	70	0	2	0	29	2
qwen1.5-1.8b-chat	26.7	84	2	2	3	56	13
qwen1.5-4b-chat	53	93	53	40	29	67	36
qwen1.5-7b-chat	71.2	99	68	76	63	73	48
qwen1.5-14b-chat	77.5	98	82	81	67	82	55
qwen1.5-32b-chat	86.8	100	99	89	79	86	68
qwen1.5-72b-chat	84.8	99	89	80	91	88	62

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

arithmetic.md

arithmetic.md

Files

arithmetic.md

Latest commit

History

arithmetic.md

File metadata and controls