T
traeai
Sign in
返回首页
InfoQ

Designing a Multi-Agent System for Engineering Support at Scale: A Case Study From Grab

8.5Score
Designing a Multi-Agent System for Engineering Support at Scale: A Case Study From Grab

TL;DR · AI Summary

Grab implemented a multi-agent system to scale engineering support, achieving 35% reduction in human intervention through layered agent architecture.

Key Takeaways

  • Grab's multi-agent system uses layered architecture decomposing tasks into task
  • Dynamic load balancing and adaptive routing boosted throughput 4x and reduced hu
  • gRPC+Protobuf optimization reduced cross-agent latency to below 150ms after iden

Outline

Jump quickly between sections.

  1. Background on Grab's engineering support challenges and motivation for multi-agent system

  2. Detailed layered agent architecture, communication protocols, and component interactions

  3. Dynamic load balancing algorithms, adaptive routing strategies, and protocol optimizations

  4. Performance metrics, human intervention reduction data, and technical bottleneck analysis

  5. Scalability principles, fault tolerance mechanisms, and continuous improvement plans

Mindmap

See how the topics connect at a glance.

查看大纲文本(无障碍 / 无 JS 友好)
  • 多智能体系统架构
    • 架构设计
    • 关键技术
    • 实施效果

Highlights

Key sentences worth saving and sharing.

#multi-agent system#engineering support#Grab#microservices#load balancing
Open original article

Designing a Multi-Agent System for Engineering Support at Scale: A Case Study From Grab - InfoQ

[BT](https://www.infoq.com/int/bt/ "bt")

InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

View an example

Enter your e-mail address

Select your country - [x] I consent to InfoQ.com handling my data as explained in this Privacy Notice.

We protect your privacy.

Close

Live Webinar and Q&A: Rethinking AppSec: Why Compiler‑Level Security Changes the Architecture Conversation (Jun 11, 2026)Save Your Seat

Close

Toggle Navigation

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

English edition

[Write for InfoQ](https://www.infoq.com/write-for-infoq/ "Write for InfoQ")

Search

RegisterSign in

Unlock the full InfoQ experience

Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with content, and download exclusive resources.

Log In

or

Don't have an InfoQ account?

Register

  • Stay updated on topics and peers that matter to youReceive instant alerts on the latest insights and trends.
  • Quickly access free resources for continuous learningMinibooks, videos with transcripts, and training materials.
  • Save articles and read at anytimeBookmark articles to read whenever youre ready.

Logo - Back to homepage

NewsArticlesPresentationsPodcastsGuides

Topics

[Development](https://www.infoq.com/development/ "Development")

  • [Java](https://www.infoq.com/java/ "Java")
  • [Kotlin](https://www.infoq.com/kotlin/ "Kotlin")
  • [.Net](https://www.infoq.com/dotnet/ ".Net")
  • [C#](https://www.infoq.com/c_sharp/ "C#")
  • [Swift](https://www.infoq.com/swift/ "Swift")
  • [Go](https://www.infoq.com/golang/ "Go")
  • [Rust](https://www.infoq.com/rust/ "Rust")
  • [JavaScript](https://www.infoq.com/javascript/ "JavaScript")

Featured in Development

Dany Lepage discusses the architectural journey of porting a hit VR title to seven non-VR platforms. He explains how his team solved the challenges of cross-progression, diverse input paradigms, and maintaining release velocity across Steam, iOS, and PlayStation. Beyond the tech, he shares candid lessons on the "product fit" gap when translating immersive social presence to 2D screens.

![Image 4: From VR to Flat Screens: Bridging the Input and Immersion Gap/presentations/game-vr-flat-screens/en/smallimage/thumbnail-1775637585504.jpg)](https://www.infoq.com/presentations/game-vr-flat-screens)

All in developmentFollow Topic

[Architecture & Design](https://www.infoq.com/architecture-design/ "Architecture & Design")

  • [Architecture](https://www.infoq.com/architecture/ "Architecture")
  • [Enterprise Architecture](https://www.infoq.com/enterprise-architecture/ "Enterprise Architecture")
  • [Scalability/Performance](https://www.infoq.com/performance-scalability/ "Scalability/Performance")
  • [Design](https://www.infoq.com/design/ "Design")
  • [Case Studies](https://www.infoq.com/Case_Study/ "Case Studies")
  • [Microservices](https://www.infoq.com/microservices/ "Microservices")
  • [Service Mesh](https://www.infoq.com/servicemesh/ "Service Mesh")
  • [Patterns](https://www.infoq.com/DesignPattern/ "Patterns")
  • [Security](https://www.infoq.com/Security/ "Security")

Featured in Architecture & Design

Michael Stiefel spoke to Baruch Sadogursky about software architecture in the age of agentic AI. LLM can function, albeit stochastically, as reasoning machines capable of interpreting human ambiguity. With the appropriate rigorous context artifacts to control the LLM’s reasoning, software specifications can become the source of truth, while the code becomes a disposable intermediate language.

![Image 5: Context is the Key to the Agentic Architecture Revolution: A Conversation with Baruch Sadogursky/podcasts/context-key-agentic-architecture-revolution/en/smallimage/the-infoq-podcast-logo-thumbnail-1778747429699.jpg)](https://www.infoq.com/podcasts/context-key-agentic-architecture-revolution)

All in architecture-designFollow Topic

[AI Infrastructure](https://www.infoq.com/ai-ml-data-eng/ "AI Infrastructure")

  • [Big Data](https://www.infoq.com/bigdata/ "Big Data")
  • [Machine Learning](https://www.infoq.com/machinelearning/ "Machine Learning")
  • [NoSQL](https://www.infoq.com/nosql/ "NoSQL")
  • [Database](https://www.infoq.com/database/ "Database")
  • [Data Analytics](https://www.infoq.com/data-analytics/ "Data Analytics")
  • [Streaming](https://www.infoq.com/streaming/ "Streaming")

Featured in AI, ML & Data Engineering

Meryem Arik discusses why modern engineering teams face "inference chaos" and how AI model gateways provide a critical control layer. She explains the balance between empowering decentralized teams to choose the best models and maintaining centralized oversight for security, RBAC, and cost control. Explore open-source solutions like LiteLLM and Doubleword to streamline your AI infra.

![Image 6: The AI Gateway: Scaling Centralized Inference Across Decentralized Teams/presentations/ai-gateway-scalability/en/smallimage/thumbnail-1778663382364.jpg)](https://www.infoq.com/presentations/ai-gateway-scalability)

All in ai-ml-data-engFollow Topic

[Culture & Methods](https://www.infoq.com/culture-methods/ "Culture & Methods")

  • [Agile](https://www.infoq.com/agile/ "Agile")
  • [Diversity](https://www.infoq.com/diversity/ "Diversity")
  • [Leadership](https://www.infoq.com/leadership/ "Leadership")
  • [Lean/Kanban](https://www.infoq.com/lean/ "Lean/Kanban")
  • [Personal Growth](https://www.infoq.com/personal-growth/ "Personal Growth")
  • [Scrum](https://www.infoq.com/scrum/ "Scrum")
  • [Sociocracy](https://www.infoq.com/sociocracy/ "Sociocracy")
  • [Software Craftmanship](https://www.infoq.com/software_craftsmanship/ "Software Craftmanship")
  • [Team Collaboration](https://www.infoq.com/team-collaboration/ "Team Collaboration")
  • [Testing](https://www.infoq.com/testing/ "Testing")
  • [UX](https://www.infoq.com/ux/ "UX")

Featured in Culture & Methods

Stéphane Di Cesare and Cat Morris share how engineers can move from being a "cost center" to a value driver using product discovery. They explain the "Double Diamond" framework and why identifying user problems must precede building solutions. Learn to choose the right metrics, build customer empathy through shadowing, and use business context to maximize the impact of your technical work.

![Image 7: Product Thinking for Cloud Native Engineers/presentations/product-cloud-native/en/smallimage/CatMorrisStephaneDiCesare-thumbnail-1778661429675.jpg)](https://www.infoq.com/presentations/product-cloud-native)

All in culture-methodsFollow Topic

DevOps

  • [Infrastructure](https://www.infoq.com/infrastructure/ "Infrastructure")
  • [Continuous Delivery](https://www.infoq.com/continuous_delivery/ "Continuous Delivery")
  • [Automation](https://www.infoq.com/automation/ "Automation")
  • [Containers](https://www.infoq.com/containers/ "Containers")
  • [Cloud](https://www.infoq.com/cloud-computing/ "Cloud")
  • [Observability](https://www.infoq.com/observability/ "Observability")

Featured in DevOps

Merrin Kurian shares the architectural blueprints and organizational processes behind Intuit’s AI transformation. She explains the "fixed, flexible, free" framework used to scale GenOS across 8,000 developers, enabling 3,500+ production experiments. She discusses critical agent failure modes, the "LLM-as-a-judge" evaluation strategy, and how to build "tool-ready" APIs for the future.

![Image 8: Powering the Future: Building Your GenAI Infrastructure Stack/presentations/infrastructure-ai-agent-development/en/smallimage/MerrinKurian-thumbnail-1778662210003.jpeg)](https://www.infoq.com/presentations/infrastructure-ai-agent-development)

All in devopsFollow Topic

[Events](https://events.infoq.com/ "Events")

Helpful links

  • [About InfoQ](https://www.infoq.com/about-infoq "About InfoQ")
  • [InfoQ Editors](https://www.infoq.com/infoq-editors "InfoQ Editors")
  • [Write for InfoQ](https://www.infoq.com/write-for-infoq "Write for InfoQ")
  • [About C4Media](https://c4media.com/ "About C4Media")
  • [Diversity](https://c4media.com/diversity "Diversity")

Choose your language

  • [En](https://www.infoq.com/news/2026/05/grab-multi-agent-support-system/# "InfoQ English")
  • 中文
  • 日本
  • Fr

![Image 9: InfoQ Architect Certification - image Online InfoQ Architect Certification The more senior you become, the fewer people pressure-test your decisions. This 5-week cohort gives you that check. Register Now.](https://certification.qconferences.com/architecture?utm_source=infoq&utm_medium=referral&utm_campaign=homepageheader_onlinecohortarchitecturejune26)![Image 10: QCon AI Boston - image QCon AI Boston Learn how leading engineering teams run AI in production—reliably, securely, and at scale. Register Now.](https://boston.qcon.ai/?utm_source=infoq&utm_medium=referral&utm_campaign=homepageheader_qaiboston26)![Image 11: QCon AI Boston - image Online InfoQ AI Engineering Certification A practical online cohort for senior engineers making decisions around retrieval, agents, evals, and AI infrastructure. Register Now.](https://certification.qconferences.com/ai-engineering?utm_source=infoq&utm_medium=referral&utm_campaign=homepageheader_onlinecohortaijuly26)![Image 12: QCon San Francisco - image QCon San Francisco Learn what's next in AI and software, from teams already doing it. Register Now.](https://qconsf.com/?utm_source=infoq&utm_medium=referral&utm_campaign=homepageheader_qsf26)

[InfoQ Homepage](https://www.infoq.com/ "InfoQ Homepage")[News](https://www.infoq.com/news "News")Designing a Multi-Agent System for Engineering Support at Scale: A Case Study From Grab

[Architecture & Design](https://www.infoq.com/architecture-design/ "Architecture & Design")

Rethinking Logs in the Age of AI Analysis (Webinar Jul 9th)

Designing a Multi-Agent System for Engineering Support at Scale: A Case Study From Grab

May 20, 2026 2 min read

by

Follow Lead Engineer

#### Write for InfoQ

Feed your curiosity.Help 550k+ global

senior developers

each month stay ahead.Get in touch

Log in to listen to this article

Audio ready to play

Audio 2

0:00 0:00

Normal 1.25x 1.5x

Like

Grab’s Analytics Data Warehouse (ADW) team has introduced a multi-agent AI system to automate engineering support workflows across its large-scale data platform, aiming to reduce repetitive operational work and improve resolution efficiency. The system is designed to handle internal engineering requests spanning data warehouse troubleshooting, SQL debugging, and platform support, while shifting engineers toward higher-value development work.

The ADW platform supports more than 1,000 internal users and manages over 15,000 tables, serving as a core analytics infrastructure component within Grab. As usage grew, the engineering team observed that a significant portion of operational effort was being consumed by repetitive support tasks and ad hoc investigations, limiting time available for platform improvement and system design work.

Sneh Agrawal, Head of Analytics @ Grab, in a LinkedIn post highlighted,

Grab’s Central Data Team is leveraging a multi-agent system to automate repetitive operational work, reclaiming hundreds of engineering hours each month. This shift is unlocking critical engineering bandwidth and enabling a transition from reactive firefighting to higher-value system building.

To address this, the team implemented a multi-agent architecture that separates incoming engineering requests into two primary workflows: investigation and enhancement. Investigation workflows are designed for diagnostic tasks such as query analysis, log retrieval, schema lookup, and issue summarization. Enhancement workflows focus on generating actionable outputs, including code changes, SQL fixes, and automated merge requests for review.

Image 14/filters:no_upscale()/news/2026/05/grab-multi-agent-support-system/en/resources/1grabtechmultiagentarch-1778993206295.jpeg)

_Multi-agent architecture tech stack (Source: Grab Tech Blog Post)_

The system is orchestrated using a LangGraph-based workflow engine combined with FastAPI services that coordinate routing, tool execution, and state management across agents. Requests are first classified and then routed to specialized agents responsible for tasks such as context retrieval, code search, or solution generation. Each agent operates with constrained responsibilities to reduce ambiguity and improve the predictability of outputs.

Image 15/filters:no_upscale()/news/2026/05/grab-multi-agent-support-system/en/resources/1grabagentworkflow-1778993206295.jpeg)

_Agent workflows, using a Supervisor that controls communication flow and task delegation (Source: Grab Tech Blog Post)_

According to Grab engineers,

The separation of investigation and enhancement paths helped us reduce complexity in agent reasoning and improved reliability in production workflows.

A key architectural decision was the consolidation of the tool ecosystem. The system initially exposed more than 30 internal tools across data access, logging, and code systems. This was later reduced to a smaller, curated toolset to improve maintainability and reduce unpredictable tool selection by agents. The tool layer includes controlled SQL execution, metadata access, log retrieval systems, and integration with Git-based workflows for change management.

Safety and governance were integrated into the system design. SQL execution is constrained through validation layers, and sensitive data handling includes mechanisms for detecting and mitigating exposure risks. In addition, all enhancement workflows that produce code changes require human-in-the-loop review before deployment, ensuring that automated outputs remain subject to engineering oversight.

Context management emerged as a significant technical challenge. Multi-step agent reasoning required maintaining relevant state across interactions while operating within token constraints. The system addresses this through structured context compression and selective retrieval strategies, allowing agents to retain necessary information without exceeding operational limits.

The impact of the system has been observed in reduced time spent on routine engineering support tasks and faster resolution cycles for common issues. While exact performance metrics were not disclosed, the team noted a shift in engineering effort away from firefighting and toward platform engineering and system improvement.

About the Author

Image 16

#### Leela Kumili

Leela is a Lead Software Engineer at Starbucks with deep expertise in building scalable, cloud-native systems and distributed platforms. She drives architecture, delivery, and operational excellence across the Rewards Platform, leading efforts to modernize systems, improve scalability, and enhance reliability. In addition to her technical leadership, Leela serves as an AI Champion for the organization, identifying opportunities to improve developer productivity and workflows using LLM-based tools and establishing best practices for AI adoption. She is passionate about building production-ready systems, enhancing developer experience, and mentoring engineers to grow in both technical and strategic impact. Her interests include platform engineering, distributed systems, developer productivity, and bridging technical solutions with business and product goals.

Show more Show less

#### This content is in the Agents topic

Follow Topic

##### Related Topics:

Followers: 4102

Follow Topic

Followers: 10238

Follow Topic

Followers: 5913

Follow Topic

Followers: 46

Follow Topic

* #### Popular in Architecture & Design

* #### Related Sponsors

  • #### Related Sponsor

![Image 17: Related sponsor icon/filters:no_upscale()/sponsorship/topic/9e025991-2977-45e6-8636-c740236b5bfc/WaveMaker-Logo-Microsite-1777568990069.png)](https://www.infoq.com/url/f/be791c31-4116-4b40-b1a0-fa93d9cb64c5/)Copilots make one developer faster. WaveMaker makes 10 squads consistent.

Bring architectural governance and predictable outcomes across skill levels. [Try WaveMaker AI](https://www.infoq.com/url/f/c86c707c-3870-4840-82d6-f42392616670/).

Related Content

May 19, 2026 ![Image 18: Icon image/presentations/infrastructure-ai-agent-development/en/smallimage/MerrinKurian-thumbnail-1778662210003.jpeg)](https://www.infoq.com/presentations/infrastructure-ai-agent-development/)

May 18, 2026

May 17, 2026

May 16, 2026

May 15, 2026

May 15, 2026

May 13, 2026 ![Image 19: Icon image/presentations/multi-agent-system-lessons/en/smallimage/thumbnail-1778068150406.jpeg)](https://www.infoq.com/presentations/multi-agent-system-lessons/)

May 13, 2026

May 11, 2026

Related Sponsors

WaveMaker's microservices platform integrates AI with modern front-end, backend, mobile, and DevOps technologies to rapidly build or augment enterprise-grade cloud-native applications. Start creating pixel perfect applications from design. Learn More.

Enterprise teams struggle to scale development without losing control and transparency. Discover how specialized developer agents in WaveMaker Studio accelerate web and mobile builds while keeping every decision reviewable, reversible, and fully yours.

  • Sponsored by

![Image 22: Icon image/filters:no_upscale()/sponsorship/topic/9e025991-2977-45e6-8636-c740236b5bfc/WaveMaker-Logo-Microsite-1777568990069.png)](https://www.infoq.com/url/f/be791c31-4116-4b40-b1a0-fa93d9cb64c5/)

Related Content

May 08, 2026

May 08, 2026

May 07, 2026

May 06, 2026

May 05, 2026

May 01, 2026

**The InfoQ** Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

Enter your e-mail address

Select your country - [x] I consent to InfoQ.com handling my data as explained in this Privacy Notice.

We protect your privacy.

  • ##### [Pip 26.1 Ships Dependency Cooldowns and Experimental Lockfile Support to Combat Supply Chain Attacks](https://www.infoq.com/news/2026/05/pip-261-dependency-cooldowns/ "Pip 26.1 Ships Dependency Cooldowns and Experimental Lockfile Support to Combat Supply Chain Attacks")
  • ##### [Cloudflare and Stripe Let AI Agents Create Accounts, Buy Domains, and Deploy to Production](https://www.infoq.com/news/2026/05/cloudflare-stripe-agent-commerce/ "Cloudflare and Stripe Let AI Agents Create Accounts, Buy Domains, and Deploy to Production")
  • ##### [Google Introduces Cloud Fraud Defense as Successor to reCAPTCHA](https://www.infoq.com/news/2026/05/cloud-fraud-defense-recaptcha/ "Google Introduces Cloud Fraud Defense as Successor to reCAPTCHA")
  • ##### [Designing a Multi-Agent System for Engineering Support at Scale: A Case Study From Grab](https://www.infoq.com/news/2026/05/grab-multi-agent-support-system/ "Designing a Multi-Agent System for Engineering Support at Scale: A Case Study From Grab")
  • ##### [OpenAI Outlines WebRTC Architecture for Low-Latency Voice AI at Scale](https://www.infoq.com/news/2026/05/openai-voice-ai-scale/ "OpenAI Outlines WebRTC Architecture for Low-Latency Voice AI at Scale")
  • ##### [Agoda Builds Multimodal Content System to Bridge Images and Reviews in Travel Discovery](https://www.infoq.com/news/2026/05/agoda-multimodal-content-system/ "Agoda Builds Multimodal Content System to Bridge Images and Reviews in Travel Discovery")
  • ##### [Product Thinking for Cloud Native Engineers](https://www.infoq.com/presentations/product-cloud-native/ "Product Thinking for Cloud Native Engineers")
  • ##### [Accelerating LLM-Driven Developer Productivity at Zoox](https://www.infoq.com/presentations/ai-software-development/ "Accelerating LLM-Driven Developer Productivity at Zoox")
  • ##### [Scaling Social Systems in Software Organizations](https://www.infoq.com/news/2026/05/scale-social-system-software-org/ "Scaling Social Systems in Software Organizations")
  • ##### [The AI Gateway: Scaling Centralized Inference Across Decentralized Teams](https://www.infoq.com/presentations/ai-gateway-scalability/ "The AI Gateway: Scaling Centralized Inference Across Decentralized Teams")
  • ##### [Anthropic Introduces MCP Tunnels for Private Agent Access to Internal Systems](https://www.infoq.com/news/2026/05/claude-mcp-tunnels/ "Anthropic Introduces MCP Tunnels for Private Agent Access to Internal Systems")
  • ##### [Anthropic's Code with Claude Announces Managed Agents, Proactive Workflows, Capability Curve](https://www.infoq.com/news/2026/05/code-with-claude/ "Anthropic's Code with Claude Announces Managed Agents, Proactive Workflows, Capability Curve")
  • ##### [Powering the Future: Building Your GenAI Infrastructure Stack](https://www.infoq.com/presentations/infrastructure-ai-agent-development/ "Powering the Future: Building Your GenAI Infrastructure Stack")
  • ##### [TanStack Details Sophisticated npm Supply Chain Attack That Compromised 42 Packages](https://www.infoq.com/news/2026/05/tanstack-supply-chain-attack/ "TanStack Details Sophisticated npm Supply Chain Attack That Compromised 42 Packages")
  • ##### [Kernel-Level Ground Truth: Why eBPF is Replacing User-Space Agents for Security Observability](https://www.infoq.com/articles/ebpf-for-security-observability/ "Kernel-Level Ground Truth: Why eBPF is Replacing User-Space Agents for Security Observability")

**The InfoQ** Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

  • Get a quick overview of content published on a variety of innovator and early adopter technologies
  • Learn what you don’t know that you don’t know
  • Stay up to date with the latest information from the topics you are interested in

Enter your e-mail address

Select your country - [x] I consent to InfoQ.com handling my data as explained in this Privacy Notice.

We protect your privacy.

**ONLINE INFOQ CERTIFICATION PROGRAM** A Cohort for Senior Engineers and Architects * **Focused on ARCHITECTURE** with Luca Mezzalira | JUNE 10 * **Focused on AI ENGINEERING** with Hien Luu | JULY 25 Bring a real architecture or AI engineering challenge from your work. Spend 5 weeks pressure-testing your approach with senior peers from other companies and experienced facilitators. Explore the upcoming cohorts. **Register Now.**

#### Events

June 1-2, 2026

June 10, 2026

July 25, 2026

November 16-20, 2026

#### Follow us on

Youtube 232K FollowersLinkedin 26K FollowersInstagram NewRSS 19K ReadersX 57.1k FollowersFacebook 21K LikesBluesky New

#### Stay in the know

The InfoQ Podcast![Image 23: The InfoQ Podcast Logo - Stay in the know](https://www.infoq.com/podcasts/)Engineering Culture Podcast![Image 24: Engineering Culture Podcast Logo - Stay in the knoww](https://www.infoq.com/podcasts/#engineering_culture)The Software Architects' Newsletter![Image 25: The Software Architects' Newsletter Logo - Stay in the know](https://www.infoq.com/software-architects-newsletter/)

General Feedback [feedback@infoq.com](mailto:feedback@infoq.com) Advertising [sales@infoq.com](mailto:sales@infoq.com) Editorial [editors@infoq.com](mailto:editors@infoq.com) Marketing [marketing@infoq.com](mailto:marketing@infoq.com)

InfoQ.com and all content copyright © 2006-2026 C4Media Inc.

Privacy Notice, Terms And Conditions, Cookie Policy

Close

[BT](https://www.infoq.com/int/bt/ "bt")

AI may generate inaccurate information. Please verify important content.