SWE-Marathon 最近有什么新动态？

traeai 已收录 1 篇与 SWE-Marathon 相关的内容。最新一篇是「[AINews] not much happened today」，由 Latent Space 发布。

论文

什么是 SWE-Marathon？

评测编码代理在 1B-token 预算下的连贯性。

为什么现在值得关注？

如果只读 3 篇

[AINews] not much happened today

Latent Space · 6.3 分

📰 SWE-Marathon 最新动态

已收录 1 篇与「SWE-Marathon」相关的 AI 资讯和分析。

[AINews] not much happened today

Latent SpaceToday1494 字 (约 6 分钟)

The article summarizes recent AI industry highlights, covering Anthropic’s Mythos/Opus discussion, the formalization of RSI research, and new long‑horizon evaluation benchmarks, underscoring the reliability gaps in frontier models.

入选理由：Anthropic 的 Opus 4.7 在某些化学任务上已匹配或超越专用 NMR 软件，显示模型在专业领域的潜力。

FeaturedArticle#AI Research#Self‑Improvement#Evaluation Benchmarks#Anthropic#Sakana AI中文

与「SWE-Marathon」经常一起出现的 AI 术语。

Anthropic Sakana AI Agents’ Last Exam Opus

💡 想追踪「SWE-Marathon」的长期趋势？去实体雷达 · SWE-Marathon 查看详细分析和跨材料问答。

什么是 SWE-Marathon？

为什么现在值得关注？

如果只读 3 篇

📰 SWE-Marathon 最新动态

[AINews] not much happened today

🔗 相关术语