T
traeai
Sign in

论文

什么是 SWE-Bench Pro

也叫:Software Writing Evaluation Benchmark

评估模型编写生产级代码能力的权威基准测试集。

📰 SWE-Bench Pro 最新动态

已收录 4 篇与「SWE-Bench Pro」相关的 AI 资讯和分析。

Read more from @MiniMax_AI:

MiniMax introduces M3, the first open-weight model combining coding, agentic, and long-context capabilities, achieving 59%+ on benchmarks like SWE-Bench Pro with 1M context support, advancing open-source LLMs toward multi-capability frontiers.

入选理由:MiniMax M3 在 SWE-Bench Pro 基准测试中取得 59.0% 正确率,领先多数开源模型。

FeaturedTweet#Open-source model#Large language model#Coding capability#Long context#MiniMax英文
.@MiniMax_AI M3 model is available on Ollama's Cloud! 

In partnership with MiniMax, the M3 model on...

MiniMax M3 Model Now Available on Ollama Cloud!

ollama(@ollama)153 字 (约 1 分钟)
75

The M3 model by MiniMax is now available on Ollama Cloud, deployed in the US with zero data retention, optimized for coding and agentic tasks. It achieves 59.0%+ on SWE-Bench Pro and supports up to 1M context length via sparse attention.

入选理由:M3 在 SWE-Bench Pro 基准中取得 59.0% 正确率,优于多数开源模型。

FeaturedTweet#M3#Ollama#MiniMax#Coding AI#Agentic AI英文
https://t.co/gEIxt9RMBF

Opus 4.7 for 33% less: How Auggie beats Claude Code on cost and quality

Augment Code(@augmentcode)890 字 (约 4 分钟)
75

Augment Code's benchmark shows that its AI coding assistant Auggie achieves a slightly higher pass rate (67.4% vs 66.3%) than Claude Code using Opus 4.7 while costing approximately 33% less, primarily due to token efficiency from precise retrieval through its Context Engine semantic indexing technology.

入选理由:Auggie 在 Terminal Bench 2.0 上以 67.4% vs 66.3% 的通过率略胜 Claude Code,同时 token 使用量减少 32%,成本降低 33%

FeaturedTweet#AI Coding Assistant#Benchmark#Cost Optimization#Token Efficiency#Augment Code英文
Model page: 

https://t.co/3OxGYEF2U6

Ollama Launches GLM-5.1

ollama(@ollama)66 字 (约 1 分钟)
65

Ollama launches its next-generation flagship model GLM-5.1 with significantly improved code generation capabilities.

入选理由:GLM-5.1 是 Ollama 的新一代旗舰模型。

FeaturedTweet#AI Model#Code Generation#Ollama英文

与「SWE-Bench Pro」经常一起出现的 AI 术语。

💡 想追踪「SWE-Bench Pro」的长期趋势?去 实体雷达 · SWE-Bench Pro 查看详细分析和跨材料问答。

AI may generate inaccurate information. Please verify important content.