Deepseek-r1测试效果不好(Deepseek-r1's testing results weren't great) #1871
Replies: 2 comments 3 replies
-
情况一样,用DeepSeek32B版本测评aime2024数据集,仅得3.33分 |
Beta Was this translation helpful? Give feedback.
-
我的错误是prediction里面有输出,但没有输出完整,没有推理出最终的答案 |
Beta Was this translation helpful? Give feedback.
-
我简单的测试了下中文试题,发现效果非常差,大家有没有跑过呢
Beta Was this translation helpful? Give feedback.
All reactions