T
traeai
Sign in
返回首页
量子位

Token is Expensive Because You Feed It Too Much Junk | @Amazon Wang Xiaoye AIGC2026

8.5Score
Token is Expensive Because You Feed It Too Much Junk | @Amazon Wang Xiaoye AIGC2026

TL;DR · AI Summary

87% of enterprises deploy AI, but only 10% derive production value; token cost stems from messy inputs, requiring five-layer architecture for enterprise-grade agent deployment.

Key Takeaways

  • 87% enterprises deploy AI, yet only 10% achieve real business value — indicating
  • Token cost isn’t high per unit, but due to excessive and unstructured input; rec
  • Amazon AWS proposes a five-layer framework: Compute → Model → Data → Platform →

Outline

Jump quickly between sections.

  1. Personal AI agents are easy to run, but enterprise-scale deployment requires overcoming complexity in compute, models, data, and talent — with only 10% achieving value.

  2. Enterprises need flexible model selection, system reliability, low user barriers, and skilled personnel — all major hurdles preventing Demo-to-Production conversion.

  3. AWS outlines five capabilities: optimized compute, multi-model support, enterprise knowledge integration, agentic platform, and reusable agent applications.

  4. AWS uses custom Graviton CPUs and Trainium AI chips to optimize inference performance across different workloads, avoiding generic chip inefficiency.

  5. Amazon Bedrock integrates Chinese models like GLM and MiniMax, enabling enterprises to choose the best-performing model without lock-in.

  6. Without proprietary data and knowledge, agents can only perform generic tasks like 'tomato scrambled eggs' — not integrated into core business workflows.

Mindmap

See how the topics connect at a glance.

查看大纲文本(无障碍 / 无 JS 友好)
  • 企业级AI Agent落地路径
    • 核心痛点
      • 价值转化率低(仅10%)
      • 工程复杂度高(千Agent稳定运行)
    • 五层解决方案
      • 算力层:定制芯片优化推理性能
      • 模型层:支持多模型自由切换
      • 数据层:注入企业专属知识
      • 平台层:Harness驱动Agent协同
      • 应用层:复用成熟Agent服务
    • 行业趋势佐证
      • Gartner预测2030年15%决策由Agent完成
      • 麦肯锡预估市场规模翻倍至4.4万亿美元

Highlights

Key sentences worth saving and sharing.

  • 87% of enterprises claim AI deployment, yet only 10% realize production value — revealing that the biggest bottleneck is not technology, but engineering and value realization.

    Paragraph 2

    ⬇︎ 下载 PNG𝕏 分享到 X
  • Token cost isn’t about price per token, but the volume and noise of input data; reduce costs by structuring prompts and compressing context.

    Paragraph 4

    ⬇︎ 下载 PNG𝕏 分享到 X
  • For 30 years, personal productivity remained unchanged until Working Agents emerged — now reshaping workflows from assistance to autonomous decision-making.

    Paragraph 6

    ⬇︎ 下载 PNG𝕏 分享到 X
#AI Agent#Enterprise AI#Amazon AWS#Token Economics#Multi-Agent Systems
Open original article

< img id="wx_img" src="https://www.qbitai.com/wp-content/uploads/imgs/qbitai-logo-1.png" width="400" height="400">

2026-05-31 18:03:40 Source: Quantum Bit

Driving World Models Toward Multi-Agent Interaction Simulation

Edited by Editorial Team, Compiled from AIGC2026

Quantum Bit | Official Account QbitAI

When everyone is “raising lobsters,” the real problem has just begun to surface.

At the recently concluded 2026 China AIGC Industry Summit, Wang Xiaoye, Technical Director of Amazon Web Services Product Technology Department, presented a set of striking data:

87% of enterprises claim to have deployed AI at scale, yet only 10% are truly deriving value from it.

Clearly, demos are never hard to build — the real challenge lies in making them run reliably in enterprise production environments.

Image 1

In his view, running a fun Agent on a personal Mac mini — where you can unplug and restart anytime — is fundamentally different from ensuring thousands of Agents operate safely, reliably, and continuously within an enterprise’s distributed environment. These represent two entirely different levels of engineering complexity.

This presentation also took a hard-core engineering perspective to directly address core enterprise pain points:

Stop expecting one model to solve everything.

Is compute cost efficient? Is your data secure? Will your Agent suffer amnesia or memory leakage? From foundational infrastructure to upper-level applications, every layer presents genuine, hard problems that must be tackled head-on.

To fully convey Wang Xiaoye’s thinking, Quantum Bit edited and organized the speech content without altering its original intent, hoping to offer you more inspiration.

The 2026 China AIGC Industry Summit was hosted by Quantum Bit, with nearly 20 industry representatives participating in discussions. Over a thousand attendees joined offline, while nearly four million watched live online, attracting widespread attention and coverage from mainstream media.

Core Ideas Summary

  • 87% of enterprises have implemented large-scale AI deployment, but only 10% are actually deriving production value.
  • Raising lobsters personally vs. raising lobsters in enterprises are completely different matters.
  • AI is not just about large models — what remains after stripping away the model (the Harness) is what truly matters.
  • Past data platforms served humans; today’s data platforms must serve AI Agents.
  • For the past 30 years, individual productivity has never been truly disrupted — until Working Agents emerged.
  • Tokens are expensive, often because you’ve fed the model too much irrelevant or messy information — not because token pricing itself is high.

Below is the original transcript of Wang Xiaoye’s speech:

The Four Gaps in Enterprise-Level Agent Deployment & Real-world Experience

Good morning, everyone! Thank you very much for Quantum Bit’s invitation.

My talk today is titled “Bridging the Gap to Agent Deployment: From Models to Enterprise-Level AI Agents.”

Unlike previous speakers, I’ll be straightforward and illustrate my thoughts through product examples and case studies.

As a technology company serving global clients for years, AWS has supported millions of enterprise customers via cloud services. Today, I’d like to share with you our thinking behind recent product updates — specifically, what questions enterprises must answer when truly deploying Agents into production environments.

Image 2

Over the past few years, both Agent products and frameworks have proliferated.

Yet, truly scalable and stable Agents operating in production environments remain rare. What’s the gap?

I’ll now share insights drawn from AWS’s joint practice with clients and summarize key learnings through recent product enhancements.

Let’s first quickly review current AI trends.

Previously, we often started conversations with clients by discussing AI application use cases — i.e., what scenarios it could be applied to.

But many specific use cases are already well-known. Summarizing broadly, we can identify several directions.

First, AI-generated audio, video, and music — these areas are already highly advanced, with daily innovations visible to everyone.

Second, foundation models — starting from language models, users have genuinely felt AI’s power, while also recognizing there’s still a long way to go.

Third, embodied intelligence.

Observing changes in embodied intelligence companies, last year most focused on dynamic control; this year, the emphasis has clearly shifted toward: how to collect data, perceive the physical world, and generate feedback actions.

Another hot direction — perhaps closer to maturity — is Agent.

Today, I’ll focus on Agents, as they’re closest to real enterprise deployment. From a practical standpoint, Agents are currently the most worthwhile topic for discussion.

Many guests have repeatedly mentioned that “raising lobsters” has become a wildly popular trend in China — many individuals are experimenting with it, and even news reports suggest engineers in Shenzhen help configure these tools for private users.

But raising lobsters personally and raising them in enterprises are two entirely different things.

In personal environments, once configured, adjustments are rarely needed. In enterprise settings, however, can you ensure thousands of Agents run securely, reliably, and continuously within your corporate infrastructure? Behind this lie numerous gaps requiring bridging.

First: Model selection and response speed.

Enterprises consider cost-effectiveness and constantly evolving model capabilities, so they need to respond swiftly to new models and flexibly switch between multiple options.

Second: Complexity of construction.

In the cloud era, we know maintaining a distributed system reliably over time is extremely difficult.

An Agent that runs smoothly on a Mac mini can be unplugged and restarted anytime by an individual user — but in enterprise-grade production environments, can it run reliably, auto-restart without interruption, and handle data trustworthily? That’s another level of engineering complexity altogether.

Third: Usability barrier.

Whether it’s Hermes, Lobster, or other autonomous Agents, engineers’ usage barriers have dropped significantly compared to traditional IT systems.

However, for business users — such as marketing or HR personnel — whether they can effectively utilize these tools remains a hurdle.

Fourth: Talent gap.

Most of the above issues ultimately require human resolution — enterprises need their platform departments to empower the entire organization with AI capabilities.

Yet, talent capable of end-to-end Agent deployment remains severely lacking.

Three weeks ago, during AWS’s global launch event, CEO Matt Garman shared several key perspectives.

First, he believes AI and Agents have triggered a massive paradigm shift in market construction — many applications should be reimagined from scratch.

This sounds familiar, but today’s context feels different.

Previously, we were mostly envisioning the future; now, we’re grounded in observable facts — based on what’s already deployed in enterprises — expressing this viewpoint.

Matt also said something I strongly agree with: For the past 30 years, individual productivity has never been truly scaledly disrupted.

Think back — whether using Office or various communication tools, software keeps evolving, but our working methods haven’t changed dramatically.

Today, however, the emergence of Agents — especially after experiencing tools like Lobster — makes us feel we’ve reached a turning point: personal work methods are indeed changing.

Before diving into trends, let’s look at some data to get a sense of the landscape.

Gartner’s analysis suggests that from 2028 to 2030, over 15% of enterprises’ daily decision-making processes will be autonomously handled by Agents or AI.

This doesn’t mean human-assisted decisions — rather, full autonomy by Agents, not merely task execution assistance.

Some labor force research reports indicate that between 2026 and 2028, 82% of enterprise leaders plan to increase hiring of “digital employees” to serve businesses — not just human hires.

McKinsey’s analysis shows that the incremental commercial market size brought by Agents and generative AI could grow from $2.6 trillion to $4.4 trillion — nearly doubling.

Is this hype?

Let’s examine actual data returned from enterprise surveys.

Again, McKinsey’s report notes that 87% of enterprises today claim large-scale AI deployment in production environments, up from 78% a year ago — progress has been rapid.

Meanwhile, Deloitte’s reports also reflect that among enterprises actually integrating AI into production systems, some have reported measurable productivity gains.

These signals may seem contradictory, but fundamentally, they all point to the same conclusion: AI Agent penetration in enterprises is already high, yet the proportion achieving true value and reaching production remains limited. McKinsey’s own report also states this proportion may be around 10%.

Image 3

Tools like Lobster show us what lies across the river. But for enterprises to reach actual production deployment, they need a bridge.

Today, I’ll share with you what this bridge should look like, through recent product capability updates.

The Bridge from Demo to Production

Simply put, AWS believes that when enterprise IT platforms and technical decision-makers drive Agents from demo to production, they should focus on five core capabilities.

First: Computational power required by AI Agents.

Returning to the Agent topic, computational power emphasizes inference more than training. While we previously discussed training compute, Agent scenarios demand greater emphasis on inference capacity.

Second: Models.

Enterprises need quick access to cutting-edge industry models or those best suited to their specific scenarios — while maintaining high cost-efficiency.

Third: Data and knowledge.

Although Agents transform answering questions into action capabilities, like cooking — without proprietary “recipes” (enterprise-specific knowledge), they can only produce generic dishes like tomato scrambled eggs. To integrate into real enterprise workflows, Agents require proprietary enterprise data and knowledge.

Fourth: Agentic platform.

Today, everyone increasingly understands that AI isn’t just about model capability — the Harness is equally critical.

Fifth: Agent applications.

Not every function needs to be built internally — many general-purpose capabilities can be directly purchased and used via vertical, specialized, or general-purpose Agent applications.

Next, I’ll elaborate on each of these five layers, sharing how we approach these issues in our product design philosophy.

Starting with the AI infrastructure layer. We still frequently discuss Token consumption and latency — what’s the underlying implication?

It’s that enterprises haven’t yet reached a state where they can use resources freely without cost concerns — much like early days when people calculated SMS costs per message, unlike today’s seamless WeChat experience.

Thus, the most fundamental ability to help enterprises reduce costs and improve efficiency should come from massive optimization at the compute layer.

AWS’s strength stems from over 20 years of serving cloud customers. We’ve worked with millions of clients and understand the types of workloads they run on the cloud.

Even for inference, we consider whether the Agent primarily focuses on planning, executing simple tasks, or handling workflows.

Based on these different workloads, there’s a consensus:

General-purpose chips cannot deliver optimal cost-efficiency across all scenarios.

AWS began developing custom chips over a decade ago, implementing virtualization technologies through hardware capabilities. Later, we launched Arm-based CPU instances — Graviton — now reaching its fifth generation. We’ve also developed dedicated AI chips — Trainium — now in its third generation.

In short, at the computing layer, we believe customers should enjoy optimal cost-efficiency tailored to specific use cases.

On the model layer, our philosophy has remained consistent for years:

Customers and enterprises need choice — not being locked into a single model.

Amazon Bedrock supports this by continually expanding model capabilities. This includes top Chinese models like Zhipu AI’s GLM and MiniMax’s offerings, which we actively promote onto the Bedrock platform.

Image 4

Of course, within the enterprise context, Bedrock offers more than platform capabilities — it provides robust data protection and privacy safeguards.

Built upon over two decades of cloud computing trust, enterprises can leverage cloud technologies like VPC to ensure their data isn’t intercepted or accessed by intermediate routing tools.

On the data and knowledge layer, we believe a crucial shift is underway: Traditional data platforms, data foundations, or data cores were historically designed to serve humans. Today, enterprise platform and tech teams must ask: Can the data platform serve AI Agents effectively?

Agents interact with data differently than humans do. Facing potentially billions of Agents, each task might trigger countless data calls.

Therefore, enterprises truly need an AI-ready data platform — presenting challenges vastly different from traditional data platforms.

Drawing from our co-creation experiences with clients, here are a few examples.

First: Shared, isolated, and concurrent memory management.

When thousands of agents from different users operate simultaneously within an enterprise, they need shared memory while avoiding cross-contamination.

How to manage shared memory, isolation, and short/long-term retention using existing enterprise authorization and permission systems becomes critically important.

Second: Memory lifecycle management.

Many assume more memory equals better performance — but that’s not true. Like humans, if long-term memory contains incorrect knowledge, outdated information, or contradictory data, it will impair the Agent’s judgment. Hence, long-term memory management is essential.

Managing memory lifecycle poses new challenges to underlying data engines.

Also, Token usage efficiency.

Everyone worries about high Token consumption and expensive Tokens — but the real culprit behind high costs is often overlooked: It’s not just the high unit price of Tokens, but the fact that you feed the model excessive irrelevant information during invocation.

For example, dumping thousands of skills into the model and letting it choose blindly; or failing to optimize the information fed into the model during memory extraction. All these lead to explosive Token usage.

Conversely, can we observe how the model is invoked throughout the entire chain? Can we monitor whether it generates hallucinations? Full-chain observability is a new capability that an AI-ready data platform must support.

If we summarize the principles guiding our data product development, they fall into three foundational pillars.

First: Designed for AI-readiness, while upholding trustworthiness.

Beyond static data encryption across the cloud stack, transmission encryption, and security guarantees, true data trustworthiness requires clear understanding of the business meaning represented by the data —

How it emerges—whether through self-evolution of the Agent or input from business requirements—and how it influences decision-making across the entire pipeline.

For demands around data that is interpretable, manageable, and governable, we aim to address them through capabilities such as SageMaker Catalog.

Second, the underlying data engine should not hinder upper-level Agent applications; thus, a robust data infrastructure is essential. In other words, capabilities like trustworthiness, performance, cost-effectiveness, and durability must all remain at a high level.

Over the years, Amazon Web Services has continuously optimized its engine layer based on rich data experience.

As an example, today, virtually all data engines support vector capabilities.

To accommodate large-scale Agent expansion, we also launched S3 Vectors, natively integrating 11-nines persistent object storage into large-scale vector retrieval and storage.

Third, we adhere to an open data architecture, avoiding any vendor or technology lock-in.

For instance, in data lakes, multimodal data lakes, and governance philosophies, we build upon open structures like Iceberg by introducing S3 Tables, enabling access across different data engines.

Recently, with the widespread adoption of Agents, everyone knows that managing memory or other files is often done via file systems or Markdown files.

We’ve also adopted an open approach, allowing object storage to directly support corresponding file semantics so that Agents can invoke them directly.

These reflect our product design philosophy, which will continue to evolve with more customer-centric data capabilities in the future.

From Enterprise-Level Harness Platforms to Intelligent Work Applications

The next layer is what I’d like to focus on most.

Beyond models, enterprises building Agents require a complete set of production-ready capabilities.

Two years ago, I shared similar insights on this stage; our core message has remained unchanged:

AI is truly not just about large models.

By 2026, within the context of Agents, users leveraging tools like Lobster—or experiencing software engineering capabilities—have become increasingly aware that, once you remove the model, everything else related to production, control, and governance can be collectively termed Harness—that is, how to effectively manage and steer it.

To illustrate: if you consider the model as the CPU, no one would hand a user a motherboard with a CPU soldered onto it. You also need software, an operating system, and various usable functionalities. Harness integrates these usable, operable, and controllable capabilities together, ultimately presenting the Agent as a complete application.

In terms of AWS products, Harness corresponds to Amazon Bedrock AgentCore.

Image 5

Its core value lies in enabling users to focus less on the intricacies of Harness and more on their business value. It remains open, allowing integration with open-source frameworks such as LangChain, CrewAI, etc.

It also manages enterprise-level concerns including large-scale security, stability, auto-restart, and reliability.

If we quickly categorize Bedrock AgentCore’s nine functional modules into three groups:

First, getting the Agent running.

Runtime provides automatic horizontal scaling so Agents can scale rapidly to any size; Memory manages context and recall; Code Interpreter and Browser empower Agents with browser interaction and code execution capabilities.

Second, integrating enterprise data and systems seamlessly.

Identity and Gateway allow enterprises to integrate existing systems like CRM or ERP while preserving real user permissions when Agents perform tasks—they represent specific employees, not entities with unlimited administrative privileges.

Third, ensuring Agents are truly可控、可管 (controllable and manageable).

Through Policy, Evaluation, and Observability functions, enterprises can define boundaries during Agent execution, assess whether outcomes meet expectations, and establish observability over the entire process.

Recently, we officially partnered with OpenAI to elevate enterprise-ready Agent-building capabilities further by introducing Managed Agent, powered by OpenAI.

Image 6

You can think of it this way: If you already view ChatGPT as more than just a chatbot—a capable Agent that helps execute tasks—then Managed Agent represents OpenAI’s cutting-edge models combined with their best practices in Agent development (via Harness), integrated with AWS’s underlying secure infrastructure, packaged into a unified offering.

Enterprises preferring to leverage OpenAI’s capabilities for task execution can adopt this solution directly.

In contrast, Bedrock AgentCore offers a more open and flexible framework and model selection for Agents.

Yet, one thing remains constant: both types of capabilities reside on the same platform, allowing enterprises to choose either or both, while inheriting AWS’s enterprise-grade security and trust controls.

Here’s a quick customer case: After adopting AgentCore, Zixun solved its core challenge—no longer needing to over-plan computing resources, enabling faster iteration at lower costs, without investing significant effort into optimizing underlying Harness-related production engineering issues.

This allows companies to realize business value more quickly, optimize costs better, and focus精力 on their own business needs.

Deployment and Future Outlook of Agent Applications

The final layer involves questions enterprises face when directly using Agent applications:

Who uses them? How are they used?

Coding Agents are generally considered mature by now.

But in workplace scenarios, Working Agents represent another direction we believe will soon explode in adoption, as we’ve already witnessed the potential of such capabilities.

There’s a paradox here: employees want Agents to handle everything for them, understand their daily routines and information; meanwhile, enterprises desire safe, controlled usage—with clear boundaries preventing Agents from doing anything unchecked.

These two goals can actually coexist.

At AWS, our answer lies in deeply personalized products like Quick.

Simply put, the most frequently heard phrase among colleagues recently is: “It really feels like a daily assistant.”

Let me share a few personal experiences.

First, I constantly switch between CRM, chat tools, email, and various platforms each day—often spending 20 minutes to an hour clearing pending tasks in the morning.

Quick’s proactive reminder feature consolidates these connections. It doesn’t merely notify me of pending items but actively suggests actions—for example, scheduling a meeting with a specific colleague or assigning a task to another.

Second, when executing tasks, Quick breaks down traditional workflow boundaries.

For instance, this PPT contains numerous external data points requiring verification. Previously, I might have spent considerable time consulting colleagues; now, Quick enables swift completion.

Third, as I use Quick, it continuously learns and adapts my patterns.

Each user builds a personal knowledge graph, and with increased usage, Quick’s decisions become increasingly aligned with mine. These are several key aspects of Quick that I find particularly valuable and worth experiencing.

Finally, let me summarize briefly: Through these five layers of capability, we aim to highlight what enterprises must consider when transitioning Agents from demos to production environments.

We can also revisit our joint announcement with OpenAI.

Besides the previously mentioned Managed Agent, it includes OpenAI’s latest models available on Bedrock, as well as its Coding Agent Codex now accessible via Bedrock.

In other words, among the three layers of our five-layer architecture, new enhancements have emerged through collaboration with OpenAI.

Moving forward, we will continue iterating on these five capabilities to accelerate enterprise empowerment.

In short, our goal is to deliver better models, backed by trustworthy data, to bring true production-grade platforms to users.

As Matt Garman said: Every application will be reimagined.

We’ve already seen leading enterprises paving this path, and through continuous iteration of our five-layer architecture, we hope to accelerate this transformation together with you, embracing the Agent era. Thank you.

*Copyright © [Year]. Unauthorized reproduction or use in any form is strictly prohibited. Offenders will be prosecuted.*

AI may generate inaccurate information. Please verify important content.