人物

Thomas Wolf

Q: Thomas Wolf 最近有什么新动态？

traeai 已收录 7 篇与 Thomas Wolf 相关的内容。最新一篇是「watching a team of agents tackling a hard theoretical physics problem is quite mesmerizing - self-co...」，由 Thomas Wolf(@Thom_Wolf) 发布。

别名：Thom_Wolf

AI 领域的专家，发布了关于 AI 生成工程构件的最新动态。

已跟踪 7 条高相关材料

TraeAI 观察

如果只读 3 篇

watching a team of agents tackling a hard theoretical physics problem is quite mesmerizing - self-co...

Thomas Wolf(@Thom_Wolf) · 7.8 分

Physics-Intern 框架通过多智能体协作将 Gemini 3.1 Pro 在 CritPt 基准上的表现从 17.7% 提升至 31.4%，创下理论物理推理新 SOTA。

I'm very excited about this extension to the celebrated Terminal-Bench to science. If you're a scie...

Thomas Wolf(@Thom_Wolf) · 7.5 分

Thomas Wolf is excited about the extension of Terminal-Bench to scientific fields, known as Terminal-Bench Science. This benchmark evaluate...

4/ Why three metrics? The metrics are designed to capture different classes of errors. Shape simil...

Thomas Wolf(@Thom_Wolf) · 7 分

文章提出三种评估指标，分别用于衡量几何形状、接口匹配和拓扑结构的正确性，强调它们各自不可替代。

观看一组智能体解决理论物理难题令人着迷——Physics-Intern 实现新突破

Thomas Wolf(@Thom_Wolf)5月14日177 字 (约 1 分钟)

Physics-Intern 框架通过多智能体协作将 Gemini 3.1 Pro 在 CritPt 基准上的表现从 17.7% 提升至 31.4%，创下理论物理推理新 SOTA。

入选理由：Physics-Intern 使用多智能体协作框架解决复杂理论物理问题。

精选推文#AI Agent#理论物理#LLM 推理#Gemini#CritPt中英混合

I'm very excited about this extension to the celebrated Terminal-Bench to science. If you're a scie...

Thomas Wolf(@Thom_Wolf)5月22日227 字 (约 1 分钟)

Thomas Wolf is excited about the extension of Terminal-Bench to scientific fields, known as Terminal-Bench Science. This benchmark evaluates AI models' ability to control tools via the command line to achieve scientific goals. It's open for contributions of real scientific workflows until August 2026, aiming to improve AI models' assistance in research work.

入选理由：Terminal-Bench Science evaluates AI models' performance in handling scientific workflows through command-line tools.