Very interesting results from this NanoGPT-Bench eval.
elvis(@omarsar0)152 字 (约 1 分钟)
62
Coding agents recover only 9.3% of human progress in AI research tasks, primarily tuning hyperparameters and ignoring algorithmic innovation, revealing their current inability to conduct real AI R&D.
入选理由:Codex、Claude Code和Autoresearch在NanoGPT-Bench评估中仅恢复9.3%的人类科研进展。
FeaturedTweet#NanoGPT-Bench#Codex#Claude Code#Autoresearch#AI agents英文
