谷歌在全球服务机队中协调A/B测试的系统

TL;DR · AI 摘要
谷歌使用一个中心化的协调系统来管理全球范围内的A/B测试,这有助于减少测试成本并提高测试速度。
核心要点
- 谷歌使用一个中心化的协调系统来管理全球范围内的A/B测试,这有助于减少测试成本并提高测试速度。
- 通过协调系统,谷歌能够更高效地管理全球范围内的A/B测试,减少测试成本并提高测试速度。
- 谷歌的协调系统能够处理大规模的A/B测试,同时保持测试的准确性和可靠性。
结构提纲
按章节快速跳转。
思维导图
用一张图看清主题之间的关系。
查看大纲文本(无障碍 / 无 JS 友好)
- 谷歌协调A/B测试系统
- 中心化协调系统
- 减少测试成本
- 提高测试速度
- 保持测试准确性
金句 / Highlights
值得收藏与分享的关键句。
谷歌使用一个中心化的协调系统来管理全球范围内的A/B测试,这有助于减少测试成本并提高测试速度。
通过协调系统,谷歌能够更高效地管理全球范围内的A/B测试,减少测试成本并提高测试速度。
谷歌的协调系统能够处理大规模的A/B测试,同时保持测试的准确性和可靠性。
Inside Google’s System for Coordinated A/B Testing Across Its Global Service Fleet - InfoQ
Your choice regarding cookies on this site
We use cookies to optimise site functionality and give you the best possible experience.
I Accept I Do Not Accept Settings
[BT](https://www.infoq.com/int/bt/ "bt")
InfoQ Software Architects' Newsletter
A monthly overview of things you need to know as an architect or aspiring architect.
Enter your e-mail address
Select your country - [x] I consent to InfoQ.com handling my data as explained in this Privacy Notice.
Close
Live Webinar and Q&A: Rethinking Logs in the Age of AI Analysis (Jul 9, 2026)Save Your Seat
Close
Toggle Navigation
Facilitating the Spread of Knowledge and Innovation in Professional Software Development
English edition
[Write for InfoQ](https://www.infoq.com/write-for-infoq/ "Write for InfoQ")
Search
Unlock the full InfoQ experience
Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with content, and download exclusive resources.
or
Don't have an InfoQ account?
- Stay updated on topics and peers that matter to youReceive instant alerts on the latest insights and trends.
- Quickly access free resources for continuous learningMinibooks, videos with transcripts, and training materials.
- Save articles and read at anytimeBookmark articles to read whenever youre ready.
NewsArticlesPresentationsPodcastsGuides
Topics
[Development](https://www.infoq.com/development/ "Development")
- [Java](https://www.infoq.com/java/ "Java")
- [Kotlin](https://www.infoq.com/kotlin/ "Kotlin")
- [.Net](https://www.infoq.com/dotnet/ ".Net")
- [C#](https://www.infoq.com/c_sharp/ "C#")
- [Swift](https://www.infoq.com/swift/ "Swift")
- [Go](https://www.infoq.com/golang/ "Go")
- [Rust](https://www.infoq.com/rust/ "Rust")
- [JavaScript](https://www.infoq.com/javascript/ "JavaScript")
Featured in Development
Dany Lepage discusses the architectural journey of porting a hit VR title to seven non-VR platforms. He explains how his team solved the challenges of cross-progression, diverse input paradigms, and maintaining release velocity across Steam, iOS, and PlayStation. Beyond the tech, he shares candid lessons on the "product fit" gap when translating immersive social presence to 2D screens.

All in developmentFollow Topic
[Architecture & Design](https://www.infoq.com/architecture-design/ "Architecture & Design")
- [Architecture](https://www.infoq.com/architecture/ "Architecture")
- [Enterprise Architecture](https://www.infoq.com/enterprise-architecture/ "Enterprise Architecture")
- [Scalability/Performance](https://www.infoq.com/performance-scalability/ "Scalability/Performance")
- [Design](https://www.infoq.com/design/ "Design")
- [Case Studies](https://www.infoq.com/Case_Study/ "Case Studies")
- [Microservices](https://www.infoq.com/microservices/ "Microservices")
- [Service Mesh](https://www.infoq.com/servicemesh/ "Service Mesh")
- [Patterns](https://www.infoq.com/DesignPattern/ "Patterns")
- [Security](https://www.infoq.com/Security/ "Security")
Featured in Architecture & Design
Shopify Staff Engineer Guilherme Carreiro discusses building and scaling highly customizable platforms. Using Shopify’s Liquid theme system as a case study, he explains how to balance extreme design flexibility with low-latency performance under massive traffic. He shares insights on implementing secure domain-specific languages, native code extensions, and resilient developer tooling.

All in architecture-designFollow Topic
[AI Infrastructure](https://www.infoq.com/ai-ml-data-eng/ "AI Infrastructure")
- [Big Data](https://www.infoq.com/bigdata/ "Big Data")
- [Machine Learning](https://www.infoq.com/machinelearning/ "Machine Learning")
- [NoSQL](https://www.infoq.com/nosql/ "NoSQL")
- [Database](https://www.infoq.com/database/ "Database")
- [Data Analytics](https://www.infoq.com/data-analytics/ "Data Analytics")
- [Streaming](https://www.infoq.com/streaming/ "Streaming")
Featured in AI, ML & Data Engineering
Sepehr Khosravi discusses the evolution of developer productivity tools. Evaluating the strengths of tools like Cursor and Claude Code, he explains actionable techniques for senior engineers - including context engineering, custom rules, and Model Context Protocol (MCP) integrations. He shares real-world benchmarks and strategic frameworks for balancing AI adoption with clean code quality.

All in ai-ml-data-engFollow Topic
[Culture & Methods](https://www.infoq.com/culture-methods/ "Culture & Methods")
- [Agile](https://www.infoq.com/agile/ "Agile")
- [Diversity](https://www.infoq.com/diversity/ "Diversity")
- [Leadership](https://www.infoq.com/leadership/ "Leadership")
- [Lean/Kanban](https://www.infoq.com/lean/ "Lean/Kanban")
- [Personal Growth](https://www.infoq.com/personal-growth/ "Personal Growth")
- [Scrum](https://www.infoq.com/scrum/ "Scrum")
- [Sociocracy](https://www.infoq.com/sociocracy/ "Sociocracy")
- [Software Craftmanship](https://www.infoq.com/software_craftsmanship/ "Software Craftmanship")
- [Team Collaboration](https://www.infoq.com/team-collaboration/ "Team Collaboration")
- [Testing](https://www.infoq.com/testing/ "Testing")
- [UX](https://www.infoq.com/ux/ "UX")
Featured in Culture & Methods
Trisha Ballakur discusses her journey from a backend software engineer to CTO and CEO, using her startup Pointz as a case study. She explains how to implement bottom-up customer discovery to find product-market fit, effectively delegate to global contractors to reduce build times, customize open-source repos like Valhalla, and apply engineering test-case models to business development.

All in culture-methodsFollow Topic
- [Infrastructure](https://www.infoq.com/infrastructure/ "Infrastructure")
- [Continuous Delivery](https://www.infoq.com/continuous_delivery/ "Continuous Delivery")
- [Automation](https://www.infoq.com/automation/ "Automation")
- [Containers](https://www.infoq.com/containers/ "Containers")
- [Cloud](https://www.infoq.com/cloud-computing/ "Cloud")
- [Observability](https://www.infoq.com/observability/ "Observability")
Featured in DevOps
Kyle Lexmond explains how to handle the high-pressure environment of severe production outages. He discusses the critical distinction between mitigation and root-cause resolution, sharing personal experiences from harrowing incident rooms. He shares valuable operational strategies on overcoming cognitive overload, establishing blameless cultures, and optimizing systems for faster recovery.

All in devopsFollow Topic
[Events](https://events.infoq.com/ "Events")
Helpful links
- [About InfoQ](https://www.infoq.com/about-infoq "About InfoQ")
- [InfoQ Editors](https://www.infoq.com/infoq-editors "InfoQ Editors")
- [Write for InfoQ](https://www.infoq.com/write-for-infoq "Write for InfoQ")
- [About C4Media](https://c4media.com/ "About C4Media")
- [Diversity](https://c4media.com/diversity "Diversity")
Choose your language

[InfoQ Homepage](https://www.infoq.com/ "InfoQ Homepage")[News](https://www.infoq.com/news "News")Inside Google’s System for Coordinated A/B Testing Across Its Global Service Fleet
[Architecture & Design](https://www.infoq.com/architecture-design/ "Architecture & Design")
Rethinking Logs in the Age of AI Analysis (Webinar Jul 9th)
Inside Google’s System for Coordinated A/B Testing Across Its Global Service Fleet
Jun 03, 2026 2 min read
by
- Leela Kumili
Follow Lead Engineer
#### Write for InfoQ
Feed your curiosity.Help 550k+ global
senior developers
each month stay ahead.Get in touch
Log in to listen to this article
Audio ready to play
0:00 0:00
Normal 1.25x 1.5x
Like
Google has detailed how it runsfleet-wide large-scale A/B experimentation across its services, describing an internal system designed to support consistent, reliable experimentation across products that operate at massive scale. The approach focuses on enabling teams to run experiments safely across a distributed infrastructure while maintaining statistical rigor and minimizing interference between experiments.
At its core, the system addresses a common challenge in large organizations operating many interconnected services: ensuring that experiments produce trustworthy causal signals when traffic spans multiple layers of infrastructure, user surfaces, and backend systems. As experimentation becomes more pervasive across product development, inconsistencies in assignment, overlapping experiments, and fragmented telemetry can degrade the quality of insights. Google's approach is designed to standardize experiment allocation and measurement across this fleet of services.
The system provides a centralized experimentation framework that coordinates how users or requests are assigned to experimental variants. Rather than relying on isolated implementations per product or service, Google uses shared infrastructure that manages experiment configuration, assignment logic, and exposure logging. This helps ensure that users are consistently bucketed into experiment groups even when they interact with multiple services or features participating in different experiments.
/filters:no_upscale()/news/2026/06/google-fleet-ab-experimentation/en/resources/1googlea:b-1779570038848.jpeg)
_Infrastructure Experiment Process at Google (Source: Google Blog Post)_
A key component is a unified assignment layer that determines how traffic is allocated across experiments. This layer supports hierarchical allocation, allowing experimentation at different levels of the stack while reducing conflicts between overlapping tests. It also ensures that the assignment is deterministic for a given user or session, which is important for avoiding contamination between variants and for maintaining stable experimental exposure over time.
To support correctness in measurement, the system emphasizes exposure logging that captures when and how users are actually exposed to experimental treatments. This enables downstream analysis systems to distinguish between assigned and truly exposed populations, improving the reliability of metrics. The platform also integrates guardrails to prevent experiments from exceeding configured traffic limits or violating safety constraints.
Google also highlights the importance of configuration propagation across its infrastructure. Experiment definitions are distributed to serving systems so that services can evaluate experiment state locally, reducing latency and dependency on centralized calls at runtime. This design supports high-throughput environments where real-time decision-making is required.
Anil Bhagavatula, Vice President @ digi edZe, in a LinkedIn post, highlights this approach as
The takeaway is that infrastructure experimentation involves more than just code adjustments; it requires a robust, statistically rigorous, and safe framework that treats the data center as a laboratory.
The experimentation infrastructure is tightly coupled with analytics pipelines that aggregate results across services. This allows teams to evaluate the impact of changes not only at a single service level but across end-to-end user journeys. By standardizing both assignment and measurement, the system reduces the operational overhead for product teams and enables faster iteration cycles. By consolidating experimentation primitives into shared infrastructure, the company aims to improve both velocity and confidence in product decisions across its ecosystem.
About the Author

#### Leela Kumili
Leela is a Lead Software Engineer at Starbucks with deep expertise in building scalable, cloud-native systems and distributed platforms. She drives architecture, delivery, and operational excellence across the Rewards Platform, leading efforts to modernize systems, improve scalability, and enhance reliability. In addition to her technical leadership, Leela serves as an AI Champion for the organization, identifying opportunities to improve developer productivity and workflows using LLM-based tools and establishing best practices for AI adoption. She is passionate about building production-ready systems, enhancing developer experience, and mentoring engineers to grow in both technical and strategic impact. Her interests include platform engineering, distributed systems, developer productivity, and bridging technical solutions with business and product goals.
Show more Show less
#### This content is in the Infrastructure topic
Follow Topic
##### Related Topics:
Followers: 4111
Follow Topic
Followers: 10253
Follow Topic
Followers: 5082
Follow Topic
Followers: 4
Follow Topic
Followers: 13
Follow Topic
Followers: 334
Follow Topic
Followers: 372
Follow Topic
Followers: 7
Follow Topic
Followers: 22
Follow Topic
Followers: 129
Follow Topic
Followers: 25
Follow Topic
Followers: 52
Follow Topic
* #### Popular in Architecture & Design
* #### Related Sponsors
- ##### [[Webinar] Creating Certainty in the Age of Agentic AI. Watch On-Demand.](https://www.infoq.com/vendorcontent/show.action?vcr=531d8edd-4f74-486b-aaca-10058c609c1c&primaryTopicId=2498&vcrPlace=BOTTOM&pageType=NEWS_PAGE&vcrReferrer=https%3A%2F%2Fwww.infoq.com%2Fnews%2F2026%2F06%2Fgoogle-fleet-ab-experimentation%2F)
* #### Related Sponsor

- June 11, 2026, 10 AM EDT
##### Rethinking AppSec: Why Compiler‑Level Security Changes the Architecture Conversation
Presented by: Anton Baranenko - Product Manager at Guardsquare
SPONSORED BY GUARDSQUARE Save your seat
Related Content
May 06, 2026 
Jun 01, 2026
May 15, 2026 
May 22, 2026
May 13, 2026
May 28, 2026 
May 11, 2026
May 07, 2026
May 06, 2026
Related Sponsors
- #### The Rise of Client-Side Risk and the Trust Gap
A global survey by TrendCandy reveals why traditional OS-level defenses fail against rising mobile app threats. Learn how client-side risk, API abuse, AI-generated code, and fast release cycles drive vulnerabilities—and how layered security and app attestation restore trust.
- #### The Case for Real-Time Threat Monitoring and Analysis in Modern Mobile App Security
Drive better mobile security with real‑time insights. This Guardsquare report shows why traditional client‑side defenses fall short against persistent threats and how continuous threat monitoring and analysis gives teams actionable visibility to protect apps, users, and revenue.
- Sponsored by

Related Content
May 05, 2026 
Apr 10, 2026 
Apr 03, 2026 
Mar 31, 2026 
Mar 20, 2026 
Mar 20, 2026 
**The InfoQ** Newsletter
A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example
Enter your e-mail address
Select your country - [x] I consent to InfoQ.com handling my data as explained in this Privacy Notice.
- ##### [Pip 26.1 Ships Dependency Cooldowns and Experimental Lockfile Support to Combat Supply Chain Attacks](https://www.infoq.com/news/2026/05/pip-261-dependency-cooldowns/ "Pip 26.1 Ships Dependency Cooldowns and Experimental Lockfile Support to Combat Supply Chain Attacks")
- ##### [Cloudflare and Stripe Let AI Agents Create Accounts, Buy Domains, and Deploy to Production](https://www.infoq.com/news/2026/05/cloudflare-stripe-agent-commerce/ "Cloudflare and Stripe Let AI Agents Create Accounts, Buy Domains, and Deploy to Production")
- ##### [Google Introduces Cloud Fraud Defense as Successor to reCAPTCHA](https://www.infoq.com/news/2026/05/cloud-fraud-defense-recaptcha/ "Google Introduces Cloud Fraud Defense as Successor to reCAPTCHA")
- ##### [Inside Google’s System for Coordinated A/B Testing Across Its Global Service Fleet](https://www.infoq.com/news/2026/06/google-fleet-ab-experimentation/ "Inside Google’s System for Coordinated A/B Testing Across Its Global Service Fleet")
- ##### [Shopify Reports 15X Faster Graphql Execution with Breadth First Engine](https://www.infoq.com/news/2026/06/shopify-graphql-cardinal-bfs/ "Shopify Reports 15X Faster Graphql Execution with Breadth First Engine")
- ##### [Theme Systems at Scale: How To Build Highly Customizable Software](https://www.infoq.com/presentations/liquid-theme-system-dsl/ "Theme Systems at Scale: How To Build Highly Customizable Software")
- ##### [From Founding Engineer to CTO to CEO – at the Same Startup](https://www.infoq.com/presentations/framework-best-practices-startup/ "From Founding Engineer to CTO to CEO – at the Same Startup")
- ##### [Accountability is the Goal for AI, with EU Regulations Supporting Transparency](https://www.infoq.com/news/2026/05/accountability-AI-EU-regulations/ "Accountability is the Goal for AI, with EU Regulations Supporting Transparency")
- ##### [From Legacy to Sovereignty: Driving the Future of Insurance through Platform Engineering](https://www.infoq.com/presentations/insurance-platform-engineering/ "From Legacy to Sovereignty: Driving the Future of Insurance through Platform Engineering")
- ##### [Choosing Your AI Copilot: Maximizing Developer Productivity](https://www.infoq.com/presentations/choosing-ai-copilot/ "Choosing Your AI Copilot: Maximizing Developer Productivity")
- ##### [Why Vector Search Alone Isn't Enough: Hybrid Retrieval for RAG](https://www.infoq.com/articles/vector-search-hybrid-retrieval-rag/ "Why Vector Search Alone Isn't Enough: Hybrid Retrieval for RAG")
- ##### [Claude Code Adds Dynamic Workflows for Parallel Agent Coordination](https://www.infoq.com/news/2026/06/dynamic-workflows-claude-code/ "Claude Code Adds Dynamic Workflows for Parallel Agent Coordination")
- ##### [The Human Toll of Incidents & Ways To Mitigate It](https://www.infoq.com/presentations/incident-response-mitigate/ "The Human Toll of Incidents & Ways To Mitigate It")
- ##### [OpenTelemetry Launches “Blueprints” Initiative to Simplify Enterprise Observability Adoption](https://www.infoq.com/news/2026/06/opentelemetry-blueprints-launch/ "OpenTelemetry Launches “Blueprints” Initiative to Simplify Enterprise Observability Adoption")
- ##### [Arm Open-Sources Metis, an AI Security Framework Outperforming Traditional SAST Tools](https://www.infoq.com/news/2026/05/arm-metis-agentic-security/ "Arm Open-Sources Metis, an AI Security Framework Outperforming Traditional SAST Tools")
**The InfoQ** Newsletter
A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example
- Get a quick overview of content published on a variety of innovator and early adopter technologies
- Learn what you don’t know that you don’t know
- Stay up to date with the latest information from the topics you are interested in
Enter your e-mail address
Select your country - [x] I consent to InfoQ.com handling my data as explained in this Privacy Notice.
#### Events
June 10, 2026
June 19, 2026
July 25, 2026
- ##### QCon San Francisco
November 16-20, 2026
- ##### QCon London 2027
April 13-16, 2027
#### Follow us on
Youtube 232K FollowersLinkedin 26K FollowersInstagram NewRSS 19K ReadersX 57.1k FollowersFacebook 21K LikesBluesky New
#### Stay in the know
The InfoQ PodcastEngineering Culture PodcastThe Software Architects' Newsletter
General Feedback [feedback@infoq.com](mailto:feedback@infoq.com) Advertising [sales@infoq.com](mailto:sales@infoq.com) Editorial [editors@infoq.com](mailto:editors@infoq.com) Marketing [marketing@infoq.com](mailto:marketing@infoq.com)
InfoQ.com and all content copyright © 2006-2026 C4Media Inc.
Privacy Notice, Terms And Conditions, Cookie Policy
Close
[BT](https://www.infoq.com/int/bt/ "bt")