T
traeai
Sign in

产品

什么是 Terminal-Bench

A benchmark for evaluating AI models' ability to control tools via the command line.

📰 Terminal-Bench 最新动态

已收录 2 篇与「Terminal-Bench」相关的 AI 资讯和分析。

1/ Today at #GoogleIO, we’re releasing Gemini 3.5, our latest family of models combining frontier in...

Jeff Dean Announces Gemini 3.5

Jeff Dean(@JeffDean)268 字 (约 2 分钟)
85

Google releases the Gemini 3.5 family, starting with 3.5 Flash for complex agentic workflows. It outperforms 3.1 Pro on coding and agent benchmarks and runs 4x faster, reaching 12x in Antigravity.

入选理由:Gemini 3.5 Flash 专为执行复杂、长周期的智能体工作流而设计。

FeaturedTweet#Google#Gemini#AI Agents#LLM#Google I/O英文
I'm very excited about this extension to the celebrated Terminal-Bench to science.

If you're a scie...

Thomas Wolf is excited about the extension of Terminal-Bench to scientific fields, known as Terminal-Bench Science. This benchmark evaluates AI models' ability to control tools via the command line to achieve scientific goals. It's open for contributions of real scientific workflows until August 2026, aiming to improve AI models' assistance in research work.

入选理由:Terminal-Bench Science evaluates AI models' performance in handling scientific workflows through command-line tools.

FeaturedTweet#AI#Science#Terminal-Bench#Benchmarking#Command Line英文

与「Terminal-Bench」经常一起出现的 AI 术语。

💡 想追踪「Terminal-Bench」的长期趋势?去 实体雷达 · Terminal-Bench 查看详细分析和跨材料问答。

AI may generate inaccurate information. Please verify important content.