T
traeai
Sign in
返回首页
AWS Machine Learning Blog

Build Highly Scalable Serverless LangGraph Multi-Agent Systems in AWS with Amazon Bedrock AgentCore

8.5Score
Build Highly Scalable Serverless LangGraph Multi-Agent Systems in AWS with Amazon Bedrock AgentCore

TL;DR · AI Summary

By combining AWS Lambda, AWS Step Functions, and LangGraph Agents, we built a highly available and scalable serverless multi-agent generative AI system that addressed issues like inference latency, scalability, state management, and operational visibility.

Key Takeaways

  • Built a serverless multi-agent system using AWS Lambda and AWS Step Functions th
  • The explicit graph execution model of LangGraph enabled deterministic coordinati
  • AgentCore Observability provided detailed visibility, capturing model inputs/out

Outline

Jump quickly between sections.

  1. Introduce the rapid evolution of generative AI and the challenges it faces, emphasizing the need for high-performance AI agents.

  2. Describe how we built a highly available and scalable serverless multi-agent system using AWS Lambda, AWS Step Functions, and LangGraph Agents.

  3. Detail the components of the system, including three specialized AI agents and an Orchestrator.

  4. Explain how the Orchestrator acts as a supervising graph routing execution, triggering parallel branches for specialized agents, and collecting their outputs for final aggregation.

  5. Outline how the Agent uses AgentCore Observability to provide detailed visualizations of each step in the agent workflow, enabling developers to inspect execution paths, audit intermediate outputs, an

  6. Briefly describe how the system was packaged as a Docker container and deployed and operated on AWS.

Mindmap

See how the topics connect at a glance.

查看大纲文本(无障碍 / 无 JS 友好)
  • 构建高可用、可扩展的无服务器多代理生成式 AI 系统
    • AWS Lambda 和 AWS Step Functions
      • 自动扩展和实时响应
    • LangGraph 显式图执行模型
      • 确定性协调、并行性和条件路由
    • AgentCore Observability
      • 详细可见性、实时监控和性能调试

Highlights

Key sentences worth saving and sharing.

  • Our solution combined AWS Lambda and AWS Step Functions to achieve an automatically scaling and real-time responding serverless agent system.

    Paragraph 2

    ⬇︎ 下载 PNG𝕏 分享到 X
  • The explicit graph execution model of LangGraph enabled deterministic coordination, parallelism, and conditional routing, simplifying complex multi-agent workflows.

    Paragraph 3

    ⬇︎ 下载 PNG𝕏 分享到 X
  • AgentCore Observability provided detailed visibility, capturing model inputs/outputs, latency, and toolchain metrics, supporting real-time monitoring and performance bottleneck debugging.

    Paragraph 4

    ⬇︎ 下载 PNG𝕏 分享到 X
#AWS Lambda#AWS Step Functions#LangGraph Agents#AgentCore Observability#Serverless Architecture
Open original article

URL 源: https://aws.amazon.com/blogs/machine-learning/build-highly-scalable-serverless-langgraph-multi-agent-systems-in-aws-with-amazon-bedrock-agentcore/

发布时间: 2026-05-26T09:41:26-08:00

Markdown 内容: 生成式人工智能已经从实验原型迅速演进到能够在生产环境中可靠运行、大规模部署,并在现实世界性能约束下稳定工作的系统。随着组织超越演示和概念验证阶段,他们越来越多地遇到与推理延迟、可扩展性、状态管理以及运营可见性相关的问题。今天构建高性能的 AI 代理不仅需要强大的模型,还需要一个能够提供一致性能、在交互中保持上下文,并对代理在生产中的推理和行为提供深入洞察的实现方案。

在这篇文章中,我们提供了一个解决方案,在 AWS 上使用 LangGraph Agents 作为集成到 Amazon Bedrock AgentCore MemoryAmazon Bedrock AgentCore Observability 中的协调器来构建高可扩展的无服务器多代理生成式人工智能系统。

我们的高可扩展无服务器多代理协调方法结合了诸如 AWS LambdaAWS Step Functions 这样的无服务器技术。这些服务可以由开发者使用,以构建能够自动缩放、实时响应事件并移除基础设施管理的 LangGraph 代理。这使它们非常适合动态、突发性的代理工作负载。通过结合这些服务,您可以使用持久化状态管理、重试和细粒度的成本控制来协调复杂的多工具代理工作流。

LangGraph 的显式图执行模型使代理之间的确定性协调、并行性和条件路由成为可能,从而使复杂的多代理工作流更容易理解和调试。通过将协调逻辑与代理行为分离,您可以使用 LangGraph 独立地添加、删除或进化专门的代理,同时保持清晰且可审计的执行路径。这对于需要可预测行为、可扩展性和多代理推理结构化控制的生产系统尤其有价值。

AgentCore Observability 扩展了这些功能,通过提供每个调用的详细可见性,捕获模型输入/输出、延迟和工具链指标,从而跨越分布式无服务器组件。来自 AgentCore Memory 的集成内存服务使代理能够在会话间保持短期对话上下文和长期知识。

解决方案概述

我们的无服务器 LangGraph 和 AgentCore 基础的多代理协调系统解决方案是一个生成式人工智能驱动的多代理活动审查系统,该系统使用多样化的角色来协调人类审查,使营销活动能够以真实的方式与目标受众产生共鸣,同时保持法律合规性和品牌标准。它由三个专门的 AI 代理组成,这些代理并行分析营销活动——角色审查代理从不同的社会人口学视角审查内容并提供共鸣评分,验证代理验证法律合规性和品牌指南遵守情况,而最终代理则综合反馈形成可操作的建议。用户通过 React 前端上传活动文档,该前端也会轮询结果并在可用时显示审查。

我们使用 LangGraph 实现协调器和专门的代理,通过将系统建模为有状态执行图。每个节点代表一个特定的角色审查、合规验证和反馈合成的离散代理功能——边定义了这些步骤之间的控制流程。协调器被实现为监督图,负责路由执行、触发专门代理的并行分支,并收集其输出进行最终聚合。LangGraph 协调器和专门的代理一起打包成一个 Docker 容器。

我们使用 AWS Lambda 作为 AWS 中的无服务器托管运行时,用于我们的 Strands 代理自动缩放、实时响应事件并移除基础设施管理。我们的协调器代理以由 Amazon API Gateway 提供的 REST 接口展示其功能。

我们的代理实现使用 AgentCore Observability 提供每个代理工作流步骤的详细可视化,使开发人员能够检查执行路径、审核中间输出并调试性能瓶颈。在 AgentCore Observability 中,我们在 Amazon CloudWatch 中提供了实时可见性,以便查看运营性能仪表板和关键指标(如跟踪、会话计数、延迟、持续时间、令牌使用量和错误率)的遥测数据。

我们使用 AgentCore Memory 在我们的代理实现中解决两个关键用例,特别是多代理共享内存,以提供独立代理运行间的上下文和共享内存,并支持多轮对话。您可以扩展此实现以提供 AI 助手的自然语言接口,因为我们的实现使用 AgentCore Memory 提供内置支持来存储对话状态和历史记录。以下架构图展示了我们解决方案的各种组件。

Image 1: architecture diagram describing the multi agent langgraph and agentcore deployment

前提条件

完成以下前提条件:

  1. 验证 Amazon Bedrock 中的模型访问权限。在此解决方案中,我们使用 Anthropic 的 Claude 4.5 Sonnet 在 Amazon Bedrock 上。
  2. 安装 AWS 命令行界面(AWS CLI)。
  3. 安装 AWS SAM CLI v1.100.0+
  4. 安装 Docker v20.x+
  5. 安装 Node.js v18.x+
  6. 安装 Docker v20.x+
  7. 安装 Python v3.11+

依赖项

我们的 Strands Agents 实现具有以下打包在 Dockerfile 中的依赖项:

  1. langchain>=0.2.0
  2. langgraph==0.3.31
  3. langgraph-prebuilt~=0.1.8
  4. langgraph-sdk~=0.1.61
  5. langchain-aws>=0.2.18
  6. langchain_tavily
  7. requests
  8. bedrock-agentcore
  9. boto3

部署解决方案

您可以从我们的 GitHub 仓库 下载该解决方案。使用 GitHub 仓库中的 README 文件中详细列出的逐步指导来部署和访问您的 AWS 环境中的解决方案:

步骤 1:克隆仓库

bash
git clone <repository url>
cd aws-genai-campaign-review-langgraph

步骤 2:配置 AWS 凭证

_配置 AWS CLI:_

bash
aws configure

_验证凭证:_

bash
aws sts get-caller-identity

步骤 3:设置 Amazon DynamoDB 人物表

_使脚本可执行:_

bash
chmod +x scripts/setup_persona_table.sh

_运行设置脚本:_

bash
./scripts/setup_persona_table.sh

步骤 4:构建 AWS SAM 应用程序

bash
sam build

步骤 5:部署基础设施

_使用引导式部署并按照提示提供堆栈名称、代理名称、AWS 区域,并接受其他区域的默认值。_

bash
sam deploy --guided

步骤 6:获取部署输出

_获取 API 端点:_

bash
aws cloudformation describe-stacks --stack-name <your stack name> --query 'Stacks[0].Outputs' --output table

_保存这些值:_

  • ApiEndpoint – API URL
  • CampaignOrchestratorApi – 代理 API URL
  • CloudFrontURL – 前端 URL
  • FrontendBucket – 前端的 S3 存储桶

步骤 8:配置前端环境

_从 CloudFormation 输出中获取值:_

bash
API_URL=$(aws cloudformation describe-stacks --stack-name <your stack name> --query 'Stacks[0].Outputs[?OutputKey==`ApiEndpoint`].OutputValue' --output text)
AGENT_API_URL=$(aws cloudformation describe-stacks --stack-name <your stack name> --query 'Stacks[0].Outputs[?OutputKey==`CampaignOrchestratorApi`].OutputValue' --output text)

_创建 .env 文件:_

bash
cat > .env << EOF
VITE_API_URL=$API_URL
VITE_AGENT_API_URL=$AGENT_API_URL
VITE_AWS_REGION= <your AWS region>
EOF

步骤 9:构建和部署前端

_安装依赖项:_

bash
npm install

_构建前端:_

bash
npm run build

_获取前端存储桶名称:_

bash
FRONTEND_BUCKET= $(aws cloudformation describe-stacks --stack-name <your stack name> --query 'Stacks[0].Outputs[?OutputKey==`FrontendBucket`].OutputValue' --output text)

_部署到 S3:_

bash
aws s3 sync dist/ s3://$FRONTEND_BUCKET --delete

Bash

_使 CloudFront 缓存失效(可选,用于更新):_

bash
DISTRIBUTION_ID=$(aws cloudfront list-distributions --query "DistributionList.Items[?Origins.Items[0].DomainName=='${FRONTEND_BUCKET}.s3.us-west-2.amazonaws.com'].Id" --output text)
aws cloudfront create-invalidation --distribution-id $DISTRIBUTION_ID --paths "/*"

步骤 10:访问应用程序

_获取 CloudFront URL:_

bash
aws cloudformation describe-stacks --stack-name <your stack name> --query 'Stacks[0].Outputs[?OutputKey==`CloudFrontURL`].OutputValue' --output text

在浏览器中打开 URL 以访问应用程序。使用此 campaign_brief.md 文件作为示例活动文档并在左侧面板上传。然后您将在右侧面板中看到多代理编排的活动输出。导航到 Bedrock AgentCore 可观察性控制台,选择您的代理以查看每个代理工作流步骤的详细可视化,如下所示:

Image 2: agentcore observability dashboard describing the spans, traces and sessions for the agent invocations

清理

为了避免重复计费,请在尝试解决方案后清理您的账户。

  1. 删除 CloudFormation 堆栈
bash
sam delete --stack-name <your stack name>
  1. 删除 DynamoDB 表
bash
aws dynamodb delete-table --table-name PersonaTable --region <your aws region>

结论

markdown
In this post, we demonstrated how combining LangGraph, Amazon Bedrock AgentCore, and serverless AWS services enables teams to create highly scalable, production-ready multi-agent generative AI systems. By leveraging LangGraph's explicit graph-based execution model for orchestration and AWS Lambda-based runtimes for execution, developers can coordinate complex, parallel agent workflows with deterministic control flow, automatic scaling, and minimal operational overhead. The integrated AgentCore Memory and Observability features address two of the most common challenges in real-world agent deployments—state management and visibility—by providing shared, durable context across agent runs and deep insights into agent behavior, performance, and cost.

These capabilities together form a repeatable architectural pattern for building enterprise-grade AI agents on AWS. Whether you're developing campaign review systems, digital assistants, or other multi-agent reasoning workflows, this approach allows you to decouple orchestration from execution, scale elastically with demand, and maintain full transparency into how agents reason and interact. By using LangGraph for structured orchestration and Amazon Bedrock AgentCore for managed runtime, memory, and observability, you can confidently transition from experimental prototypes to reliable, scalable generative AI systems in production.

* * *

## About the Authors

[![Image 3](https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2025/04/28/bacchus.jpg)](https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2025/04/23/photo-phonetool-150x150-1.jpg)**Kanishk Mahajan** is Principal – AI/ML with AWS Professional Services. In this role, he leads GenAI and agentic transformations for some of AWS' largest customers in Telecommunications and Media & Entertainment.

[![Image 4](https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/05/04/Akshay-photo-100x122.png)](https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/05/04/Akshay-photo.png)**Akshay Parkhi** is a Machine Learning Engineer at Amazon Web Services with over 16 years of experience leading enterprise transformation across SAP, cloud, DevOps, and AI/ML. He designs and scales production-grade AI and agentic systems that drive critical business outcomes in complex, real-world environments.

AI may generate inaccurate information. Please verify important content.