公司

Pruna

Q: Pruna 最近有什么新动态？

traeai 已收录 1 篇与 Pruna 相关的内容。最新一篇是「20 days of compute vs 7 hours: rethinking what state-of-the-art means — Bertrand Charpentier, Pruna」，由 AI Engineer 发布。

与Bertrand Charpentier共同参与本次讨论的机构或项目方。

已跟踪 1 条高相关材料

TraeAI 观察

如果只读 3 篇

20 days of compute vs 7 hours: rethinking what state-of-the-art means — Bertrand Charpentier, Pruna

AI Engineer · 7.5 分

当前“最先进”AI模型的评判标准存在误导性，仅依赖公开排行榜或内部评估易导致选择大模型的懒惰方案，实际应结合多榜单差异、Elo评分波动及真实场景需求综合判断。

20 days of compute vs 7 hours: rethinking what state-of-the-art means — Bertrand Charpentier, Pruna

AI Engineer6月1日4459 字 (约 18 分钟)

Current 'state-of-the-art' AI model evaluation is misleading; relying solely on public leaderboards or internal tests often leads to lazy large-model choices—real selection should combine multi-board differences, Elo score volatility, and real-world use cases.

入选理由：不同排行榜（如Arena、Design Arena）对同一图像编辑模型排名差异显著，例如Human模型在不同榜单位置相差5名以上。

FeaturedVideo#AI Model Evaluation#Leaderboards#Elo Score#Model Selection英文

跨材料问答 · Pruna

回答基于：Pruna 相关 1 条材料