FrontierMath Evaluation Reveals Fatal Errors, Updated Scores to Follow
AI HOT 精选118 字 (约 1 分钟)
55
FrontierMath evaluation found fatal errors in ~33% of problems; Epoch AI will release corrected dataset with updated scores.
入选理由:FrontierMath Tiers 1-4中约33%的题目被标记为致命错误
FeaturedArticle#AI Evaluation#Math Benchmark#Data Correction#Epoch AI#Model Assessment英文
