在 @augmentcode ，我们对我们的 AI 架构做出了反直觉的赌注。

Augment Code(@augmentcode)

Augment Code(@augmentcode)2026年5月12日

At @augmentcode, we took a counter-intuitive bet on our AI architecture.

8.5Score

TL;DR · AI Summary

Augment Code used Mercury 2 as a dedicated subagent, achieving an 82% faster context compaction and 90% lower summarization costs.

Key Takeaways

Using Mercury 2 as a dedicated subagent, context compaction speed improved by 82
Summarization costs reduced by 90%, with tool search summaries under 1 second.
Reduced LLM spend by 30% via Prism routing.

Outline

Jump quickly between sections.

§Introduction
Augment Code made a counter-intuitive decision on AI architecture.
·Architecture Decision
Used Mercury 2 as a dedicated subagent instead of the primary coding model to preserve KV cache.
·User Benefits
Users gained 82% faster context compaction and 90% lower summarization costs.
·Other Benefits
Tool search summaries under 1 second, LLM spend reduced by 30%.

Mindmap

See how the topics connect at a glance.

查看大纲文本（无障碍 / 无 JS 友好）

AI 架构
- Mercury 2
  - 82% 上下文压缩速度提升
  - 90% 摘要成本降低
- Prism 路由
  - 30% LLM 开销降低

Highlights

Key sentences worth saving and sharing.

Using Mercury 2 as a dedicated subagent, achieved an 82% faster context compaction.
— Paragraph 1
⬇︎ 下载 PNG 𝕏 分享到 X
Summarization costs reduced by 90%, with tool search summaries under 1 second.
— Paragraph 1
⬇︎ 下载 PNG 𝕏 分享到 X
Reduced LLM spend by 30% via Prism routing.
— Paragraph 1
⬇︎ 下载 PNG 𝕏 分享到 X

#AI Architecture#Mercury 2#Inception Labs

Open original article

Instead of using the primary coding model to preserve KV cache (the industry standard), we used Mercury 2 by @_inception_ai as a dedicated subagent.

The payoff for our users: 82% faster context" / X

At

, we took a counter-intuitive bet on our AI architecture. Instead of using the primary coding model to preserve KV cache (the industry standard), we used Mercury 2 by

as a dedicated subagent. The payoff for our users: 82% faster context compaction, 90% lower summarization costs, <1s tool-search summaries, 30% lower LLM spend via Prism routing Read the full story here: inceptionlabs.ai/blog/rise-of-r

Quote

Inception

@_inception_ai

10h

@augmentcode rebuilt their context compaction layer around Mercury 2. 82% latency cut. 90% cost cut. Comparable quality to Opus 4.7. Running in production today. "We took a counter-intuitive bet. We decoupled summarization entirely, offloading it to Mercury 2 as a dedicated