Claude Sonnet 4.5 最近有什么新动态？

traeai 已收录 3 篇与 Claude Sonnet 4.5 相关的内容。最新一篇是「3x Faster Search: Parallel Test-Time Scaling with Instructed-Retriever-1」，由 Databricks 发布。

模型

Claude Sonnet 4.5

别名：Claude

在 Cua-Bench 上完成 5/25 任务的模型。

已跟踪 3 条高相关材料

TraeAI 观察

如果只读 3 篇

3x Faster Search: Parallel Test-Time Scaling with Instructed-Retriever-1

Databricks · 9.2 分

Databricks发布Instructed-Retriever-1模型，通过并行测试时计算将搜索延迟降低3倍、首Token时间缩至2秒，且无需牺牲检索质量。该模型统一查询生成与重排序任务，利用多枢轴分组重排和并行查询扩展实现召回率与精确度的帕累托最优，为企业级RAG系统提供低...

Cua 和 Snorkel AI 联合发布「Cua-Bench」：评测 Agent 在专业软件上的 Computer Use 能力 @trycua @SnorkelAI Cua-Bench 首个...

meng shao(@shao__meng) · 8.5 分

Cua-Bench 是首个评测 Agent 在专业软件上执行任务能力的基准，测试显示当前模型在复杂 GUI 操作和任务规划上存在明显短板。

The last six months in LLMs in five minutes

Simon Willison's Weblog · 8.5 分

2025年11月是LLM发展的关键转折点，三大厂商的模型性能在六个月内五次易主，编码代理实现质的飞跃达到日常可用水平，同时Warelay等新兴工具开始出现。

3x Faster Search: Parallel Test-Time Scaling with Instructed-Retriever-1

Databricks6月5日1484 字 (约 6 分钟)

Databricks' Instructed-Retriever-1 cuts search latency by 3x and TTFT to ~2s via parallel test-time scaling without quality loss. The unified model handles query generation and reranking in parallel using multi-pivot groupwise reranking, achieving Pareto-optimal recall-precision tradeoffs for enterprise RAG systems.

入选理由：Instructed-Retriever-1使搜索延迟降低3倍以上，TTFT降至约2秒，无需重新配置。

FeaturedArticle#RAG#Test-Time Scaling#Instructed-Retriever-1#Databricks#Retrieval英文

Cua 和 Snorkel AI 联合发布「Cua-Bench」：评测 Agent 在专业软件上的 Computer Use 能力 @trycua @SnorkelAI Cua-Bench 首个...

meng shao(@shao__meng)6月16日1018 字 (约 5 分钟)

Cua-Bench 是首个评测 Agent 在专业软件上执行任务能力的基准，测试显示当前模型在复杂 GUI 操作和任务规划上存在明显短板。

入选理由：当前最强模型 GPT-5.5 在 Cua-Bench 上仅完成 6/25 任务，完全通过率仅 24%。

FeaturedTweet#Cua-Bench#Agent#KiCad#评测基准#AI中英混合

The last six months in LLMs in five minutes

Simon Willison's Weblog5月19日1128 字 (约 5 分钟)

November 2025 was a critical inflection point for LLM development, with model performance changing hands five times among three major vendors in six months, coding agents achieving qualitative leaps to daily usability, and emerging tools like Warelay beginning to appear.

入选理由：2025年11月三大厂商模型性能排名变化5次，Claude Opus 4.5最终胜出

FeaturedArticle#LLM#AI Programming#Model Evaluation#Anthropic#OpenAI英文

跨材料问答 · Claude Sonnet 4.5

回答基于：Claude Sonnet 4.5 相关 3 条材料