通过 Gemini 企业级 Agent 平台的 Agentic RAG 实现可靠响应

Google Research Blog

Google Research Blog2026年6月5日

通过 Gemini 企业级 Agent 平台的 Agentic RAG 实现可靠响应

8.8内容质量

TL;DR · AI 摘要

Agentic RAG 通过引入多智能体架构（编排者、规划者、重写者和分发者）解决了传统 RAG 无法处理多源、多跳复杂查询的“数据孤岛”问题，将事实准确率提升了高达 34%。

核心要点

Agentic RAG 采用多智能体协作模式，将复杂请求分解为规划、查询重写和多源检索等专业化步骤。
该框架的核心竞争力在于“持久性（Persistence）”，即系统能识别信息缺失并迭代搜索，直到获得足够上下文。
在事实性数据集上的评估显示，该 Agentic RAG 方案比标准 RAG 的准确率提升了最高 34%。

结构提纲

按章节快速跳转。

§传统 RAG 的局限性
单步检索 RAG 无法处理需要跨多个数据源进行多次跳转（Multi-hop）的复杂业务查询。
§多智能体架构的分工
系统通过编排者、规划者、查询重写者和搜索分发者的协作来完成复杂任务。
·核心角色定义
规划者负责映射信息路径，重写者将模糊请求转化为精准搜索词，分发者执行多源数据采集。
§Agentic RAG 的核心差异化：持久性
不同于传统 RAG 在未找到结果时直接报错，该框架会持续迭代搜索直到满足“充分上下文”条件。
§实际应用场景分析
通过医疗病例查询示例，展示了系统如何将一个复杂指令拆解为药房、营养和临床笔记三个维度的并行检索。

思维导图

用一张图看清主题之间的关系。

查看大纲文本（无障碍 / 无 JS 友好）

Gemini Agentic RAG
- 解决痛点
  - 数据孤岛 (Data Islands)
  - 多跳查询 (Multi-hop Queries)
- 多智能体角色
  - Orchestrator (任务分发)
  - Planner (路径规划)
  - Query Rewriter (查询优化)
  - Search Fanout (多源检索)
- 核心优势
  - 持久性 (Persistence)
  - 充分上下文 (Sufficient Context)
  - 准确率提升 34%

金句 / Highlights

值得收藏与分享的关键句。

与标准 RAG 相比，我们的框架在事实性数据集上的准确率提升了高达 34%。
— 第 3 段
⬇︎ 下载 PNG 𝕏 分享到 X
新 Agentic RAG 框架的关键区别在于持久性……它知道何时缺失信息并继续搜索，直到上下文完整。
— What makes our agentic RAG different from others
⬇︎ 下载 PNG 𝕏 分享到 X
可以将多智能体 RAG 想象成一个组织严密的研发部门，而不是一个单一的搜索引擎。
— How multi-agent architectures work
⬇︎ 下载 PNG 𝕏 分享到 X

#Agentic RAG#Gemini#多智能体系统#Google Cloud#大模型

打开原文

Current single-step retrieval-augmented generation (RAG) systems weren’t designed for the multi-source, multi-hop queries of modern business workflows. If, for example, the query is, "What are the specs of the server used in Project X?", the system might find documents about Project X, but those documents might only mention a server ID. It won't know to take that ID and perform a second search in another database to find the specs. The result is a partial answer or a "not found" response because the information is spread across different "islands" of data, requiring deeper exploration to find the facts.

Enter “agentic RAG”, which plans, reasons, and iteratively interacts with data sources, enabling the handling of complex queries to increase dependability and accuracy.

Today, we’re excited to introduce Google’s Gemini Enterprise Agent Platform-hosted version of Cross-Corpus Retrieval powered by Agentic RAG. Like other multi-agent RAG frameworks, ours employs various agents that work together to reliably answer complex queries. Unlike other multi-agent frameworks, ours incorporates sufficient context to confirm if there is enough information for an accurate answer. Compared to standard RAG, our framework increases accuracy on factuality datasets by up to 34%. We also evaluated our system with proprietary, internal datasets and found that we achieve better grounding and improved reasoning accuracy on multiple domain-specific tasks.

How multi-agent architectures work: Planning, rewriting, and routing

It helps to think of multi-agent RAG not as a single search engine but as an organized research department. In a "monolithic" or “Vanilla” RAG system, the retrieval component just looks at your question and tries to find matching documents before an LLM generates a response.

In a multi-agent framework, the system breaks the job down into specialized roles:

_The Orchestrator_ evaluates your complex request and decides, "This isn't a one-step job", and delegates the work to agents.
_The Planner Agent_ maps out the information pathways. If you ask about a project’s budget and its timeline, for example, the Planner Agent decides: _"First, we need to check the finance database, then we need to check the project management logs."_
_The Query Rewriter_ translates your request into multiple search queries. It turns _"What's up with Project X?"_ into _"Status report for Project X Q3"_ and _"Key blockers for Project X team."_
_The Search Fanout Agent_ takes those refined queries and sends them to various retrieval sources to collect snippets of information.
Finally, an LLM aggregates all the context to deliver a final response.

What makes our agentic RAG different from others

The key difference with our new agentic RAG framework is _persistence_. Compared to other RAG solutions, our framework is effective because it knows when it is missing information and continues searching until the context is complete. This prevents the AI from "guessing" when the first search comes up empty, or from simply saying, “I don’t have enough information.” While this is an appropriate response in some cases, sometimes the information is there and we just need to find it.

For example, imagine a doctor asking about a patient’s medications, diet, and allergies:

_"What are the discharge medications and dietary restrictions for John Doe after his knee surgery, and did he have any allergic reactions during his stay? Do not include medications only administered during hospital inpatient or emergency department visits except for heparin IV drip or Tenecteplase."_

In response, our framework kicks off many specialized agents. We give an overview of our solution in the figure below and then describe it in more detail afterwards.

Phase 1: Orchestration

The Root Agent parses the doctor's request and delegates the tasks to sub-agents. The Planner Agent identifies that it needs to check three distinct areas: Pharmacy, Nutrition, and Clinical Notes. The Query Rewriter breaks the long request into simple, searchable questions so the retriever can more accurately find relevant content.

Phase 2: Search (standard step)

The RAG Agent searches the patient's records for all the query fanouts at once. It finds the medications and the diet information, but it can’t find any mention of allergies in the most obvious files. In a standard or “Vanilla” RAG system, the process might end here with an incomplete answer.

Phase 3: Sufficient Context Agent (new research innovation)

Think of the Sufficient Context Agent as a quality-control inspector standing at the end of an assembly line. It examines three specific findings before allowing a response to be generated:

#### 1. Retrieved snippets

The Sufficient Context Agent evaluates the actual text chunks pulled from the database by the RAG Agent. In the doctor's example, these could be the specific paragraphs found in the "Discharge Summary" and "Nutrition Notes." It reads these to see if the information needed to answer the query is present in those sentences.

#### 2. Intermediate draft

The system also creates a "rough draft" response. The Sufficient Context Agent then reviews the prompt, draft, and retrieved snippets to evaluate whether the model has everything it needs to provide a comprehensive and grounded answer. If the prompt asks for three things (meds, diet, allergies) but the snippets only contain information about two, the Sufficient Context Agent flags it as “insufficient context.”

#### 3. Missing pieces analysis

This is the most critical part. The Sufficient Context Agent identifies exactly what is not there. It doesn't just output that "this is insufficient"; it generates a specific "Reason" and "Feedback" log. For example:

Finding: "We have the medication list and the low-sodium diet instructions."

Gap: "We are missing information from the source documents about allergic reactions or adverse events during the stay."

The Sufficient Context Agent compares what was found against the original request and asks: _"Did we answer the allergy question?”_ If not, it then issues an "Insufficient Context" signal and provides specific feedback: _"You found meds and diet, but you missed allergies. Go back and search specifically for 'rashes' or 'adverse events'."_ In a multi-source situation, it can also request more information or decide that the source isn’t relevant to the query.

Phase 4: Iteration

Because of the Sufficient Context Agent feedback, the Query Rewriter creates a new search for "rashes." Then, the RAG Agent dives deeper into files it ignored the first time and finds the missing information.

Phase 5: Synthesis (final answer)

The Sufficient Context Agent checks the data one last time. Now that it has the meds, diet, and allergy info, it determines we can stop searching. Finally, the Synthesis Agent writes a clean, accurate summary for the doctor.

Experiments and results

We evaluated agentic RAG on FramesQA, which is based on the FRAMES paper. An example multi-hop question is:

_“Of the top two most watched television season finales (as of June 2024), which finale ran the longest in length and by how much?”_

The RAG system needs to perform multiple steps to arrive at the correct answer. First, it has to identify that the two most watched finales are from the shows M*A*S*H) and Cheers. Then, it has to find their running times, and calculate the length difference. In many RAG settings (Vanilla RAG or agentic RAG without sufficient context), we could end up in a situation where the model says something like:

_“Despite multiple scans, I found no explicit runtimes for M*A*S*H or Cheers. The documents provide viewership data, but not the duration in minutes or hours.”_

This does not answer the question.

Fortunately, our agentic RAG can solve this by first searching for the TV shows, then using the Query Rewriter and Sufficient Context Agent to have a targeted search for the run time of M*A*S*H or Cheers. Then, Gemini can easily determine which finale ran the longest in length and by how much:

_“The M*A*S*H finale ran for 150 minutes, making it the longest of the top two. It was 52 minutes longer than the Cheers finale, which ran for approximately 98 minutes.”_

We ran an experiment to test this ability at scale (FramesQA has 824 queries along with a corpus containing 2,676 PDF documents). In the “Vanilla” RAG setting, we use Google’s RAG Engine (which has an advanced retrieval engine, LLM parser, and re-ranker). We compared this with our agentic RAG in two settings. In the single-corpus setting, we retrieve from the FramesQA documents. In the cross-corpus setting, we also include three other distracting datasets, where the Planner Agent must determine where to retrieve from. This cross-corpus setting mimics use cases where companies have databases managed by separate teams. We compute accuracy by using an LLM-as-a-judge to compare the system responses to the ground truth answers in the dataset.

In the cross-corpus setting, our system nearly matches its single-corpus accuracy. Even when the Planner Agent must select the correct corpus out of 4 possibilities, we successfully route the search queries and answer 90.1% of questions correctly. Also, the latency of both single- and cross-corpus versions is about the same (within 3% on average). This demonstrates that our Agentic RAG system can reason over multiple, unrelated data sources, which opens up possibilities for more flexible retrieval scenarios.

Conclusion

By combining advanced query planning, routing, and sufficient context, our agentic RAG system ensures that AI-generated responses are auditable, traceable, and grounded. We look forward to seeing how the machine learning community leverages these new agentic capabilities to build the next generation of dependable AI systems. This new feature is now available as a public preview offering in Gemini Enterprise Agent Platform.

Acknowledgments

_This project is joint work with Bo Li, Zhongjie Mao, Tiger Jin, Yuhong Kan, Mohd Abdullah (Obito), Chun-Sung Ferng, Pooneh Mortazavi, Roger (Peng) Yu, Eran Lewis, and Ivan Kuznetsov. We thank Kimberly Schwede for designing the graphics and Mark Simborg for writing assistance. We also thank our key enterprise partners for critical user feedback, data, and insights._