T
traeai
Sign in

产品

什么是 Lean-IMO-Bench

用于评估数学证明能力的基准数据集,LEAP 将其一次求解率从<10%提升至70%。

📰 Lean-IMO-Bench 最新动态

已收录 1 篇与「Lean-IMO-Bench」相关的 AI 资讯和分析。

New research from Google.

Just shows the impressive results you can get from custom agent harnesses...

Google's LEAP framework wraps a general-purpose LLM in an agentic scaffold that grounds every step in the Lean compiler and iterates against verifier feedback. It solves all 12 Putnam 2025 problems with one model, lifting the one-shot solve rate of the Lean-IMO-Bench from under 10% to 70%, outperforming a specialized gold-medal system that scores 48. Paper: arXiv:2606.03303. Learn to build effective AI agents at academy.dair.ai.

入选理由:LEAP 通用 LLM 一模型解决全部 12 道 Putnam 2025 题。

FeaturedTweet#LEAP#Lean compiler#Putnam 2025#agentic framework#general-purpose LLM英文

与「Lean-IMO-Bench」经常一起出现的 AI 术语。

💡 想追踪「Lean-IMO-Bench」的长期趋势?去 实体雷达 · Lean-IMO-Bench 查看详细分析和跨材料问答。

AI may generate inaccurate information. Please verify important content.