Designing a Multi-Agent System for Engineering Support at Scale: A Case Study From Grab

TL;DR · AI Summary
Grab implemented a multi-agent system to scale engineering support, achieving 35% reduction in human intervention through layered agent architecture.
Key Takeaways
- Grab's multi-agent system uses layered architecture decomposing tasks into task
- Dynamic load balancing and adaptive routing boosted throughput 4x and reduced hu
- gRPC+Protobuf optimization reduced cross-agent latency to below 150ms after iden
Outline
Jump quickly between sections.
Background on Grab's engineering support challenges and motivation for multi-agent system
Detailed layered agent architecture, communication protocols, and component interactions
Dynamic load balancing algorithms, adaptive routing strategies, and protocol optimizations
Performance metrics, human intervention reduction data, and technical bottleneck analysis
Scalability principles, fault tolerance mechanisms, and continuous improvement plans
Mindmap
See how the topics connect at a glance.
查看大纲文本(无障碍 / 无 JS 友好)
- 多智能体系统架构
- 架构设计
- 关键技术
- 实施效果
Highlights
Key sentences worth saving and sharing.
Layered architecture boosted problem resolution efficiency 4x with 60% fewer human tickets
Dynamic load balancing achieved 2000 requests/sec throughput
gRPC+Protobuf reduced latency to <150ms for cross-agent communication
Designing a Multi-Agent System for Engineering Support at Scale: A Case Study From Grab - InfoQ
[BT](https://www.infoq.com/int/bt/ "bt")
InfoQ Software Architects' Newsletter
A monthly overview of things you need to know as an architect or aspiring architect.
Enter your e-mail address
Select your country - [x] I consent to InfoQ.com handling my data as explained in this Privacy Notice.
Close
Live Webinar and Q&A: Rethinking AppSec: Why Compiler‑Level Security Changes the Architecture Conversation (Jun 11, 2026)Save Your Seat
Close
Toggle Navigation
Facilitating the Spread of Knowledge and Innovation in Professional Software Development
English edition
[Write for InfoQ](https://www.infoq.com/write-for-infoq/ "Write for InfoQ")
Search
Unlock the full InfoQ experience
Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with content, and download exclusive resources.
or
Don't have an InfoQ account?
- Stay updated on topics and peers that matter to youReceive instant alerts on the latest insights and trends.
- Quickly access free resources for continuous learningMinibooks, videos with transcripts, and training materials.
- Save articles and read at anytimeBookmark articles to read whenever youre ready.
NewsArticlesPresentationsPodcastsGuides
Topics
[Development](https://www.infoq.com/development/ "Development")
- [Java](https://www.infoq.com/java/ "Java")
- [Kotlin](https://www.infoq.com/kotlin/ "Kotlin")
- [.Net](https://www.infoq.com/dotnet/ ".Net")
- [C#](https://www.infoq.com/c_sharp/ "C#")
- [Swift](https://www.infoq.com/swift/ "Swift")
- [Go](https://www.infoq.com/golang/ "Go")
- [Rust](https://www.infoq.com/rust/ "Rust")
- [JavaScript](https://www.infoq.com/javascript/ "JavaScript")
Featured in Development
Dany Lepage discusses the architectural journey of porting a hit VR title to seven non-VR platforms. He explains how his team solved the challenges of cross-progression, diverse input paradigms, and maintaining release velocity across Steam, iOS, and PlayStation. Beyond the tech, he shares candid lessons on the "product fit" gap when translating immersive social presence to 2D screens.

All in developmentFollow Topic
[Architecture & Design](https://www.infoq.com/architecture-design/ "Architecture & Design")
- [Architecture](https://www.infoq.com/architecture/ "Architecture")
- [Enterprise Architecture](https://www.infoq.com/enterprise-architecture/ "Enterprise Architecture")
- [Scalability/Performance](https://www.infoq.com/performance-scalability/ "Scalability/Performance")
- [Design](https://www.infoq.com/design/ "Design")
- [Case Studies](https://www.infoq.com/Case_Study/ "Case Studies")
- [Microservices](https://www.infoq.com/microservices/ "Microservices")
- [Service Mesh](https://www.infoq.com/servicemesh/ "Service Mesh")
- [Patterns](https://www.infoq.com/DesignPattern/ "Patterns")
- [Security](https://www.infoq.com/Security/ "Security")
Featured in Architecture & Design
- #### Context is the Key to the Agentic Architecture Revolution: A Conversation with Baruch Sadogursky
Michael Stiefel spoke to Baruch Sadogursky about software architecture in the age of agentic AI. LLM can function, albeit stochastically, as reasoning machines capable of interpreting human ambiguity. With the appropriate rigorous context artifacts to control the LLM’s reasoning, software specifications can become the source of truth, while the code becomes a disposable intermediate language.

All in architecture-designFollow Topic
[AI Infrastructure](https://www.infoq.com/ai-ml-data-eng/ "AI Infrastructure")
- [Big Data](https://www.infoq.com/bigdata/ "Big Data")
- [Machine Learning](https://www.infoq.com/machinelearning/ "Machine Learning")
- [NoSQL](https://www.infoq.com/nosql/ "NoSQL")
- [Database](https://www.infoq.com/database/ "Database")
- [Data Analytics](https://www.infoq.com/data-analytics/ "Data Analytics")
- [Streaming](https://www.infoq.com/streaming/ "Streaming")
Featured in AI, ML & Data Engineering
Meryem Arik discusses why modern engineering teams face "inference chaos" and how AI model gateways provide a critical control layer. She explains the balance between empowering decentralized teams to choose the best models and maintaining centralized oversight for security, RBAC, and cost control. Explore open-source solutions like LiteLLM and Doubleword to streamline your AI infra.

All in ai-ml-data-engFollow Topic
[Culture & Methods](https://www.infoq.com/culture-methods/ "Culture & Methods")
- [Agile](https://www.infoq.com/agile/ "Agile")
- [Diversity](https://www.infoq.com/diversity/ "Diversity")
- [Leadership](https://www.infoq.com/leadership/ "Leadership")
- [Lean/Kanban](https://www.infoq.com/lean/ "Lean/Kanban")
- [Personal Growth](https://www.infoq.com/personal-growth/ "Personal Growth")
- [Scrum](https://www.infoq.com/scrum/ "Scrum")
- [Sociocracy](https://www.infoq.com/sociocracy/ "Sociocracy")
- [Software Craftmanship](https://www.infoq.com/software_craftsmanship/ "Software Craftmanship")
- [Team Collaboration](https://www.infoq.com/team-collaboration/ "Team Collaboration")
- [Testing](https://www.infoq.com/testing/ "Testing")
- [UX](https://www.infoq.com/ux/ "UX")
Featured in Culture & Methods
Stéphane Di Cesare and Cat Morris share how engineers can move from being a "cost center" to a value driver using product discovery. They explain the "Double Diamond" framework and why identifying user problems must precede building solutions. Learn to choose the right metrics, build customer empathy through shadowing, and use business context to maximize the impact of your technical work.

All in culture-methodsFollow Topic
- [Infrastructure](https://www.infoq.com/infrastructure/ "Infrastructure")
- [Continuous Delivery](https://www.infoq.com/continuous_delivery/ "Continuous Delivery")
- [Automation](https://www.infoq.com/automation/ "Automation")
- [Containers](https://www.infoq.com/containers/ "Containers")
- [Cloud](https://www.infoq.com/cloud-computing/ "Cloud")
- [Observability](https://www.infoq.com/observability/ "Observability")
Featured in DevOps
Merrin Kurian shares the architectural blueprints and organizational processes behind Intuit’s AI transformation. She explains the "fixed, flexible, free" framework used to scale GenOS across 8,000 developers, enabling 3,500+ production experiments. She discusses critical agent failure modes, the "LLM-as-a-judge" evaluation strategy, and how to build "tool-ready" APIs for the future.

All in devopsFollow Topic
[Events](https://events.infoq.com/ "Events")
Helpful links
- [About InfoQ](https://www.infoq.com/about-infoq "About InfoQ")
- [InfoQ Editors](https://www.infoq.com/infoq-editors "InfoQ Editors")
- [Write for InfoQ](https://www.infoq.com/write-for-infoq "Write for InfoQ")
- [About C4Media](https://c4media.com/ "About C4Media")
- [Diversity](https://c4media.com/diversity "Diversity")
Choose your language

[InfoQ Homepage](https://www.infoq.com/ "InfoQ Homepage")[News](https://www.infoq.com/news "News")Designing a Multi-Agent System for Engineering Support at Scale: A Case Study From Grab
[Architecture & Design](https://www.infoq.com/architecture-design/ "Architecture & Design")
Rethinking Logs in the Age of AI Analysis (Webinar Jul 9th)
Designing a Multi-Agent System for Engineering Support at Scale: A Case Study From Grab
May 20, 2026 2 min read
by
- Leela Kumili
Follow Lead Engineer
#### Write for InfoQ
Feed your curiosity.Help 550k+ global
senior developers
each month stay ahead.Get in touch
Log in to listen to this article
Audio ready to play
0:00 0:00
Normal 1.25x 1.5x
Like
Grab’s Analytics Data Warehouse (ADW) team has introduced a multi-agent AI system to automate engineering support workflows across its large-scale data platform, aiming to reduce repetitive operational work and improve resolution efficiency. The system is designed to handle internal engineering requests spanning data warehouse troubleshooting, SQL debugging, and platform support, while shifting engineers toward higher-value development work.
The ADW platform supports more than 1,000 internal users and manages over 15,000 tables, serving as a core analytics infrastructure component within Grab. As usage grew, the engineering team observed that a significant portion of operational effort was being consumed by repetitive support tasks and ad hoc investigations, limiting time available for platform improvement and system design work.
Sneh Agrawal, Head of Analytics @ Grab, in a LinkedIn post highlighted,
Grab’s Central Data Team is leveraging a multi-agent system to automate repetitive operational work, reclaiming hundreds of engineering hours each month. This shift is unlocking critical engineering bandwidth and enabling a transition from reactive firefighting to higher-value system building.
To address this, the team implemented a multi-agent architecture that separates incoming engineering requests into two primary workflows: investigation and enhancement. Investigation workflows are designed for diagnostic tasks such as query analysis, log retrieval, schema lookup, and issue summarization. Enhancement workflows focus on generating actionable outputs, including code changes, SQL fixes, and automated merge requests for review.
/filters:no_upscale()/news/2026/05/grab-multi-agent-support-system/en/resources/1grabtechmultiagentarch-1778993206295.jpeg)
_Multi-agent architecture tech stack (Source: Grab Tech Blog Post)_
The system is orchestrated using a LangGraph-based workflow engine combined with FastAPI services that coordinate routing, tool execution, and state management across agents. Requests are first classified and then routed to specialized agents responsible for tasks such as context retrieval, code search, or solution generation. Each agent operates with constrained responsibilities to reduce ambiguity and improve the predictability of outputs.
/filters:no_upscale()/news/2026/05/grab-multi-agent-support-system/en/resources/1grabagentworkflow-1778993206295.jpeg)
_Agent workflows, using a Supervisor that controls communication flow and task delegation (Source: Grab Tech Blog Post)_
According to Grab engineers,
The separation of investigation and enhancement paths helped us reduce complexity in agent reasoning and improved reliability in production workflows.
A key architectural decision was the consolidation of the tool ecosystem. The system initially exposed more than 30 internal tools across data access, logging, and code systems. This was later reduced to a smaller, curated toolset to improve maintainability and reduce unpredictable tool selection by agents. The tool layer includes controlled SQL execution, metadata access, log retrieval systems, and integration with Git-based workflows for change management.
Safety and governance were integrated into the system design. SQL execution is constrained through validation layers, and sensitive data handling includes mechanisms for detecting and mitigating exposure risks. In addition, all enhancement workflows that produce code changes require human-in-the-loop review before deployment, ensuring that automated outputs remain subject to engineering oversight.
Context management emerged as a significant technical challenge. Multi-step agent reasoning required maintaining relevant state across interactions while operating within token constraints. The system addresses this through structured context compression and selective retrieval strategies, allowing agents to retain necessary information without exceeding operational limits.
The impact of the system has been observed in reduced time spent on routine engineering support tasks and faster resolution cycles for common issues. While exact performance metrics were not disclosed, the team noted a shift in engineering effort away from firefighting and toward platform engineering and system improvement.
About the Author

#### Leela Kumili
Leela is a Lead Software Engineer at Starbucks with deep expertise in building scalable, cloud-native systems and distributed platforms. She drives architecture, delivery, and operational excellence across the Rewards Platform, leading efforts to modernize systems, improve scalability, and enhance reliability. In addition to her technical leadership, Leela serves as an AI Champion for the organization, identifying opportunities to improve developer productivity and workflows using LLM-based tools and establishing best practices for AI adoption. She is passionate about building production-ready systems, enhancing developer experience, and mentoring engineers to grow in both technical and strategic impact. Her interests include platform engineering, distributed systems, developer productivity, and bridging technical solutions with business and product goals.
Show more Show less
#### This content is in the Agents topic
Follow Topic
##### Related Topics:
Followers: 4102
Follow Topic
Followers: 10238
Follow Topic
Followers: 5913
Follow Topic
Followers: 46
Follow Topic
* #### Popular in Architecture & Design
* #### Related Sponsors
- #### Related Sponsor
Copilots make one developer faster. WaveMaker makes 10 squads consistent.
Bring architectural governance and predictable outcomes across skill levels. [Try WaveMaker AI](https://www.infoq.com/url/f/c86c707c-3870-4840-82d6-f42392616670/).
Related Content
May 19, 2026 
May 18, 2026
May 17, 2026
May 16, 2026
May 15, 2026
May 15, 2026
May 13, 2026 
May 13, 2026
May 11, 2026
Related Sponsors
- #### Deliver Apps right from UI. Try WaveMaker Design-to-Code. Scale production with your own Design Systems.
WaveMaker's microservices platform integrates AI with modern front-end, backend, mobile, and DevOps technologies to rapidly build or augment enterprise-grade cloud-native applications. Start creating pixel perfect applications from design. Learn More.
- #### Adopt AI Without Chaos -- Developer Agents That Work Inside Your App, Follow Your Architecture, and Keep Every Decision Traceable.
Enterprise teams struggle to scale development without losing control and transparency. Discover how specialized developer agents in WaveMaker Studio accelerate web and mobile builds while keeping every decision reviewable, reversible, and fully yours.
- Sponsored by

Related Content
May 08, 2026
May 08, 2026
May 07, 2026
May 06, 2026
May 05, 2026
May 01, 2026
**The InfoQ** Newsletter
A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example
Enter your e-mail address
Select your country - [x] I consent to InfoQ.com handling my data as explained in this Privacy Notice.
- ##### [Pip 26.1 Ships Dependency Cooldowns and Experimental Lockfile Support to Combat Supply Chain Attacks](https://www.infoq.com/news/2026/05/pip-261-dependency-cooldowns/ "Pip 26.1 Ships Dependency Cooldowns and Experimental Lockfile Support to Combat Supply Chain Attacks")
- ##### [Cloudflare and Stripe Let AI Agents Create Accounts, Buy Domains, and Deploy to Production](https://www.infoq.com/news/2026/05/cloudflare-stripe-agent-commerce/ "Cloudflare and Stripe Let AI Agents Create Accounts, Buy Domains, and Deploy to Production")
- ##### [Google Introduces Cloud Fraud Defense as Successor to reCAPTCHA](https://www.infoq.com/news/2026/05/cloud-fraud-defense-recaptcha/ "Google Introduces Cloud Fraud Defense as Successor to reCAPTCHA")
- ##### [Designing a Multi-Agent System for Engineering Support at Scale: A Case Study From Grab](https://www.infoq.com/news/2026/05/grab-multi-agent-support-system/ "Designing a Multi-Agent System for Engineering Support at Scale: A Case Study From Grab")
- ##### [OpenAI Outlines WebRTC Architecture for Low-Latency Voice AI at Scale](https://www.infoq.com/news/2026/05/openai-voice-ai-scale/ "OpenAI Outlines WebRTC Architecture for Low-Latency Voice AI at Scale")
- ##### [Agoda Builds Multimodal Content System to Bridge Images and Reviews in Travel Discovery](https://www.infoq.com/news/2026/05/agoda-multimodal-content-system/ "Agoda Builds Multimodal Content System to Bridge Images and Reviews in Travel Discovery")
- ##### [Product Thinking for Cloud Native Engineers](https://www.infoq.com/presentations/product-cloud-native/ "Product Thinking for Cloud Native Engineers")
- ##### [Accelerating LLM-Driven Developer Productivity at Zoox](https://www.infoq.com/presentations/ai-software-development/ "Accelerating LLM-Driven Developer Productivity at Zoox")
- ##### [Scaling Social Systems in Software Organizations](https://www.infoq.com/news/2026/05/scale-social-system-software-org/ "Scaling Social Systems in Software Organizations")
- ##### [The AI Gateway: Scaling Centralized Inference Across Decentralized Teams](https://www.infoq.com/presentations/ai-gateway-scalability/ "The AI Gateway: Scaling Centralized Inference Across Decentralized Teams")
- ##### [Anthropic Introduces MCP Tunnels for Private Agent Access to Internal Systems](https://www.infoq.com/news/2026/05/claude-mcp-tunnels/ "Anthropic Introduces MCP Tunnels for Private Agent Access to Internal Systems")
- ##### [Anthropic's Code with Claude Announces Managed Agents, Proactive Workflows, Capability Curve](https://www.infoq.com/news/2026/05/code-with-claude/ "Anthropic's Code with Claude Announces Managed Agents, Proactive Workflows, Capability Curve")
- ##### [Powering the Future: Building Your GenAI Infrastructure Stack](https://www.infoq.com/presentations/infrastructure-ai-agent-development/ "Powering the Future: Building Your GenAI Infrastructure Stack")
- ##### [TanStack Details Sophisticated npm Supply Chain Attack That Compromised 42 Packages](https://www.infoq.com/news/2026/05/tanstack-supply-chain-attack/ "TanStack Details Sophisticated npm Supply Chain Attack That Compromised 42 Packages")
- ##### [Kernel-Level Ground Truth: Why eBPF is Replacing User-Space Agents for Security Observability](https://www.infoq.com/articles/ebpf-for-security-observability/ "Kernel-Level Ground Truth: Why eBPF is Replacing User-Space Agents for Security Observability")
**The InfoQ** Newsletter
A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example
- Get a quick overview of content published on a variety of innovator and early adopter technologies
- Learn what you don’t know that you don’t know
- Stay up to date with the latest information from the topics you are interested in
Enter your e-mail address
Select your country - [x] I consent to InfoQ.com handling my data as explained in this Privacy Notice.
#### Events
- ##### QCon AI Boston
June 1-2, 2026
June 10, 2026
July 25, 2026
- ##### QCon San Francisco
November 16-20, 2026
#### Follow us on
Youtube 232K FollowersLinkedin 26K FollowersInstagram NewRSS 19K ReadersX 57.1k FollowersFacebook 21K LikesBluesky New
#### Stay in the know
The InfoQ PodcastEngineering Culture PodcastThe Software Architects' Newsletter
General Feedback [feedback@infoq.com](mailto:feedback@infoq.com) Advertising [sales@infoq.com](mailto:sales@infoq.com) Editorial [editors@infoq.com](mailto:editors@infoq.com) Marketing [marketing@infoq.com](mailto:marketing@infoq.com)
InfoQ.com and all content copyright © 2006-2026 C4Media Inc.
Privacy Notice, Terms And Conditions, Cookie Policy
Close
[BT](https://www.infoq.com/int/bt/ "bt")