产品

Lean-IMO-Bench

Q: Lean-IMO-Bench 最近有什么新动态？

traeai 已收录 1 篇与 Lean-IMO-Bench 相关的内容。最新一篇是「New research from Google. Just shows the impressive results you can get from custom agent harnesses...」，由 elvis(@omarsar0) 发布。

用于评估数学证明能力的基准数据集，LEAP 将其一次求解率从<10%提升至70%。

已跟踪 1 条高相关材料

TraeAI 观察

如果只读 3 篇

New research from Google. Just shows the impressive results you can get from custom agent harnesses...

elvis(@omarsar0) · 8.8 分

Google 的 LEAP 框架以通用 LLM 为核心，结合 Lean 编译器与验证器反馈，将 Lean-IMO-Bench 一次求解率从低于10%提升至70%，并一模型解决全部 Putnam 2025 题目，超越专门系统48分。

Google New Research: LEAP Framework Enables Efficient Solving of Mathematical Proofs with General LLMs

elvis(@omarsar0)6月4日144 字 (约 1 分钟)

Google's LEAP framework wraps a general-purpose LLM in an agentic scaffold that grounds every step in the Lean compiler and iterates against verifier feedback. It solves all 12 Putnam 2025 problems with one model, lifting the one-shot solve rate of the Lean-IMO-Bench from under 10% to 70%, outperforming a specialized gold-medal system that scores 48. Paper: arXiv:2606.03303. Learn to build effective AI agents at academy.dair.ai.

入选理由：LEAP 通用 LLM 一模型解决全部 12 道 Putnam 2025 题。

FeaturedTweet#LEAP#Lean compiler#Putnam 2025#agentic framework#general-purpose LLM英文

跨材料问答 · Lean-IMO-Bench

回答基于：Lean-IMO-Bench 相关 1 条材料