T
traeai
Sign in

论文

Terminal Bench 2.1

别名:Terminal Agent Benchmark

衡量模型在终端交互与自动化任务执行能力的基准。

相关材料

已收录 1 条与 Terminal Bench 2.1 相关的内容,按评分排序。

Read more from @MiniMax_AI:

MiniMax introduces M3, the first open-weight model combining coding, agentic, and long-context capabilities, achieving 59%+ on benchmarks like SWE-Bench Pro with 1M context support, advancing open-source LLMs toward multi-capability frontiers.

入选理由:MiniMax M3 在 SWE-Bench Pro 基准测试中取得 59.0% 正确率,领先多数开源模型。

FeaturedTweet#Open-source model#Large language model#Coding capability#Long context#MiniMax英文

跨材料问答 · Terminal Bench 2.1

回答基于:Terminal Bench 2.1 相关 1 条材料
    0 / 500

    AI may generate inaccurate information. Please verify important content.