T
traeai
登录
返回首页
InfoQ

Yelp Achieves Zero-Downtime Upgrade of Over 1,000 Cassandra Nodes

5.0Score
Yelp Achieves Zero-Downtime Upgrade of Over 1,000 Cassandra Nodes
AI 深度提炼

Yelp Achieves Zero-Downtime Upgrade of Over 1,000 Cassandra Nodes - InfoQ

[BT](http://www.infoq.com/int/bt/ "bt")

InfoQ Software Architects' Newsletter

A monthly overview of things you need to know as an architect or aspiring architect.

View an example

Enter your e-mail address

Select your country - [x] I consent to InfoQ.com handling my data as explained in this Privacy Notice.

We protect your privacy.

Close

Live Webinar and Q&A: Portable by Design: Data Mobility & Recovery Patterns for Multi-Cloud Systems (May 21, 2026)Save Your Seat

Close

Toggle Navigation

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

English edition

[Write for InfoQ](http://www.infoq.com/write-for-infoq/ "Write for InfoQ")

Search

RegisterSign in

Unlock the full InfoQ experience

Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with content, and download exclusive resources.

Log In

or

Don't have an InfoQ account?

Register

  • **Stay updated on topics and peers that matter to you**Receive instant alerts on the latest insights and trends.
  • **Quickly access free resources for continuous learning**Minibooks, videos with transcripts, and training materials.
  • **Save articles and read at anytime**Bookmark articles to read whenever youre ready.

Logo - Back to homepage

NewsArticlesPresentationsPodcastsGuides

Topics

[Development](http://www.infoq.com/development/ "Development")

  • [Java](http://www.infoq.com/java/ "Java")
  • [Kotlin](http://www.infoq.com/kotlin/ "Kotlin")
  • [.Net](http://www.infoq.com/dotnet/ ".Net")
  • [C#](http://www.infoq.com/c_sharp/ "C#")
  • [Swift](http://www.infoq.com/swift/ "Swift")
  • [Go](http://www.infoq.com/golang/ "Go")
  • [Rust](http://www.infoq.com/rust/ "Rust")
  • [JavaScript](http://www.infoq.com/javascript/ "JavaScript")

Featured in Development

Dany Lepage discusses the architectural journey of porting a hit VR title to seven non-VR platforms. He explains how his team solved the challenges of cross-progression, diverse input paradigms, and maintaining release velocity across Steam, iOS, and PlayStation. Beyond the tech, he shares candid lessons on the "product fit" gap when translating immersive social presence to 2D screens.

![Image 2: From VR to Flat Screens: Bridging the Input and Immersion Gap/presentations/game-vr-flat-screens/en/smallimage/thumbnail-1775637585504.jpg)](http://www.infoq.com/presentations/game-vr-flat-screens)

All in developmentFollow Topic

[Architecture & Design](http://www.infoq.com/architecture-design/ "Architecture & Design")

  • [Architecture](http://www.infoq.com/architecture/ "Architecture")
  • [Enterprise Architecture](http://www.infoq.com/enterprise-architecture/ "Enterprise Architecture")
  • [Scalability/Performance](http://www.infoq.com/performance-scalability/ "Scalability/Performance")
  • [Design](http://www.infoq.com/design/ "Design")
  • [Case Studies](http://www.infoq.com/Case_Study/ "Case Studies")
  • [Microservices](http://www.infoq.com/microservices/ "Microservices")
  • [Service Mesh](http://www.infoq.com/servicemesh/ "Service Mesh")
  • [Patterns](http://www.infoq.com/DesignPattern/ "Patterns")
  • [Security](http://www.infoq.com/Security/ "Security")

Featured in Architecture & Design

Frank Yu shares Coinbase’s engineering philosophy for building resilient, fair, and fast financial exchanges. He explains the power of a single-threaded architecture combined with the Raft consensus algorithm to maintain 24/7 availability. He discusses how determinism enables zero-downtime rolling deployments and the ability to replay production logs for perfect bug reproduction.

![Image 3: How to Build an Exchange: Sub Millisecond Response Times and 24/7 Uptimes in the Cloud/presentations/exchange-systems-cloud/en/smallimage/frank-yu-thumbnail-1776173818222.jpeg)](http://www.infoq.com/presentations/exchange-systems-cloud)

All in architecture-designFollow Topic

[AI Infrastructure](http://www.infoq.com/ai-ml-data-eng/ "AI Infrastructure")

  • [Big Data](http://www.infoq.com/bigdata/ "Big Data")
  • [Machine Learning](http://www.infoq.com/machinelearning/ "Machine Learning")
  • [NoSQL](http://www.infoq.com/nosql/ "NoSQL")
  • [Database](http://www.infoq.com/database/ "Database")
  • [Data Analytics](http://www.infoq.com/data-analytics/ "Data Analytics")
  • [Streaming](http://www.infoq.com/streaming/ "Streaming")

Featured in AI, ML & Data Engineering

Shuman Ghosemajumder explains how generative AI has transformed from a creative curiosity into a high-scale tool for disinformation and fraud. He shares insights on "Disinformation Automation," the fallacy of CAPTCHA in an AI world, and why engineering leaders must adopt zero-trust "cyber fusion" strategies to defend against automated attacks that mimic human behavior with chilling accuracy.

![Image 4: Deepfakes, Disinformation, and AI Content Are Taking Over the Internet/presentations/deepfakes-ai/en/smallimage/shuman-ghosemajumder-thumbnail-1776248048343.jpeg)](http://www.infoq.com/presentations/deepfakes-ai)

All in ai-ml-data-engFollow Topic

[Culture & Methods](http://www.infoq.com/culture-methods/ "Culture & Methods")

  • [Agile](http://www.infoq.com/agile/ "Agile")
  • [Diversity](http://www.infoq.com/diversity/ "Diversity")
  • [Leadership](http://www.infoq.com/leadership/ "Leadership")
  • [Lean/Kanban](http://www.infoq.com/lean/ "Lean/Kanban")
  • [Personal Growth](http://www.infoq.com/personal-growth/ "Personal Growth")
  • [Scrum](http://www.infoq.com/scrum/ "Scrum")
  • [Sociocracy](http://www.infoq.com/sociocracy/ "Sociocracy")
  • [Software Craftmanship](http://www.infoq.com/software_craftsmanship/ "Software Craftmanship")
  • [Team Collaboration](http://www.infoq.com/team-collaboration/ "Team Collaboration")
  • [Testing](http://www.infoq.com/testing/ "Testing")
  • [UX](http://www.infoq.com/ux/ "UX")

Featured in Culture & Methods

The panelists share insights on evolving company culture. They discuss leveraging feedback loops, lending social capital, and the friction between legacy bureaucracy and agile engineering. The panel explains how to maintain cohesion in remote teams and use interviews to uncover the true "unmanicured" culture of a firm.

![Image 5: Panel: Building a Culture that Works/presentations/panel-positive-culture/en/smallimage/ln-500x500-1775048593311.jpg)](http://www.infoq.com/presentations/panel-positive-culture)

All in culture-methodsFollow Topic

DevOps

  • [Infrastructure](http://www.infoq.com/infrastructure/ "Infrastructure")
  • [Continuous Delivery](http://www.infoq.com/continuous_delivery/ "Continuous Delivery")
  • [Automation](http://www.infoq.com/automation/ "Automation")
  • [Containers](http://www.infoq.com/containers/ "Containers")
  • [Cloud](http://www.infoq.com/cloud-computing/ "Cloud")
  • [Observability](http://www.infoq.com/observability/ "Observability")

Featured in DevOps

Docker Extensions boost developer speed but create a "visibility gap" by isolating telemetry. To meet enterprise needs, extensions must act as bridges to centralized platforms. This article details how to use OpenTelemetry, policy-as-code, and encryption to build secure pipelines. Learn to balance developer productivity with the governance required for scalable, compliant observability.

![Image 6: Beyond One-Click: Designing an Enterprise-Grade Observability Extension for Docker/articles/enterprise-grade-observability-extension-docker/en/smallimage/enterprise-grade-observability-extension-docker-thumbnail-1775560652994.jpg)](http://www.infoq.com/articles/enterprise-grade-observability-extension-docker)

All in devopsFollow Topic

[Events](https://events.infoq.com/ "Events")

Helpful links

  • [About InfoQ](http://www.infoq.com/about-infoq "About InfoQ")
  • [InfoQ Editors](http://www.infoq.com/infoq-editors "InfoQ Editors")
  • [Write for InfoQ](http://www.infoq.com/write-for-infoq "Write for InfoQ")
  • [About C4Media](https://c4media.com/ "About C4Media")
  • [Diversity](https://c4media.com/diversity "Diversity")

Choose your language

  • [En](http://www.infoq.com/news/2026/04/yelp-cassandra-upgrade/# "InfoQ English")
  • 中文
  • 日本
  • Fr

![Image 7: InfoQ Architect Certification - image Online InfoQ Architect Certification Join Luca Mezzalira for this 5-week online cohort. Master socio-technical architecture leadership. **Register Now.**](https://certification.qconferences.com/?utm_source=infoq&utm_medium=referral&utm_campaign=homepageheader_onlinecohortaprmayjun26)![Image 8: QCon AI Boston - image QCon AI Boston Learn how leading engineering teams run AI in production—reliably, securely, and at scale. **Early Bird ends April 14.**](https://boston.qcon.ai/?utm_source=infoq&utm_medium=referral&utm_campaign=homepageheader_qaiboston26)![Image 9: QCon San Francisco - image QCon San Francisco Learn what's next in AI and software, from teams already doing it. **Early Bird ends April 14.**](https://qconsf.com/?utm_source=infoq&utm_medium=referral&utm_campaign=homepageheader_qsf26)

[InfoQ Homepage](http://www.infoq.com/ "InfoQ Homepage")[News](http://www.infoq.com/news "News")Yelp Achieves Zero-Downtime Upgrade of Over 1,000 Cassandra Nodes

[DevOps](http://www.infoq.com/Devops/ "DevOps")

QCon San Francisco (Nov 16-20): Deep technical sessions. Peer conversations that change how you think.

Yelp Achieves Zero-Downtime Upgrade of Over 1,000 Cassandra Nodes

Apr 24, 2026 2 min read

by

Follow Software Architect | Game Designer| Writer | Speaker

#### Write for InfoQ

**Feed your curiosity.**Help 550k+ global

senior developers

each month stay ahead.Get in touch

Log in to listen to this article

Audio ready to play

Your browser does not support the audio element.

0:00 0:00

Normal 1.25x 1.5x

Like

Yelp has successfully completed a large-scale upgrade of its Apache Cassandra infrastructure, spanning more than 1,000 nodes, without any service downtime, offering a blueprint for managing stateful systems at scale. The upgrade, detailed by Yelp’s Database Reliability Engineering team, demonstrates how careful planning, phased execution, and automation can enable seamless modernization of critical data infrastructure.

The effort addressed one of the most complex challenges in distributed systems: upgrading a live, highly available database without interrupting production workloads. Cassandra underpins many of Yelp’s core services, making downtime unacceptable. To mitigate risk, the team adopted a rolling upgrade strategy, incrementally upgrading nodes while maintaining cluster availability and data consistency throughout the process. This ensured that applications continued to read and write data uninterrupted as the system evolved.

At the heart of the approach was strict adherence to compatibility and incremental change principles. By upgrading nodes in controlled batches and allowing the cluster to rebalance and repair between steps, Yelp minimized the risk of cascading failures. This aligns with broader best practices in Cassandra upgrades, where rolling upgrades maintain backward compatibility and allow the system to remain operational while individual components are replaced.

The team also invested heavily in automation and observability, ensuring that each phase of the upgrade could be monitored and validated in real time. Automated orchestration reduced the likelihood of human error, while continuous health checks ensured that any anomalies could be detected and addressed before impacting users.

Unlike stateless services, distributed databases like Cassandra require careful coordination during upgrades due to data replication, consistency guarantees, and node interdependencies. Yelp's success highlights the importance of understanding these dynamics, particularly how data is replicated and how nodes recover and synchronize after changes.

Industry-wide, similar zero-downtime migrations often rely on techniques such as dual writes, replication), or introducing new clusters alongside existing ones before gradually shifting traffic. However, Yelp's approach demonstrates that even in-place upgrades of large clusters can be achieved safely when executed with discipline and robust tooling.

Yelp's upgrade reflects a growing trend in cloud-native engineering: eliminating downtime as a constraint. As businesses increasingly depend on always-on systems, traditional maintenance windows are becoming obsolete. Instead, organizations are adopting strategies such as rolling upgrades, blue-green deployments, and live data migration to ensure continuous availability.

Other companiestackling similar challenges, such as migrating Cassandra clusters across Kubernetes environments, have also emphasized the need for careful planning, staged rollouts, and strong operational controls to achieve zero downtime in production systems.

Ultimately, Yelp's Cassandra upgrade underscores a key evolution in platform engineering: reliability is no longer just about uptime, but about seamless change. Systems must not only remain available but also be continuously upgradable without disrupting users.

By demonstrating that even large-scale, stateful infrastructure can be modernized without downtime, Yelp sets a new benchmark for engineering teams managing critical data platforms, showing that with the right combination of strategy, tooling, and discipline, zero-downtime operations are achievable at scale.

About the Author

![Image 11](http://www.infoq.com/profile/Craig-Risi/)

#### **Craig Risi**

Craig Risi is a man of many talents but has no sense of how to use them. He could be out changing the world but prefers to make software instead. He possesses a passion for software design, but more importantly software quality and designing systems in a technically diverse and constantly evolving tech world. Craig is also the writer of the book, Quality By Design: Designing Quality Software Systems, and writes regular articles on his blog sites and various other tech sites around the world. When not playing with software, he can often be found writing, designing board games, or running long distances for no apparent reason.

Show more Show less

#### This content is in the DevOps topic

Follow Topic

##### Related Topics:

Followers: 5049

Follow Topic

Followers: 16

Follow Topic

Followers: 0

Follow Topic

* #### Related Editorial

* #### Related Sponsors

  • ##### [[Webinar] Creating Certainty in the Age of Agentic AI. Watch On-Demand.](http://www.infoq.com/vendorcontent/show.action?vcr=531d8edd-4f74-486b-aaca-10058c609c1c&primaryTopicId=1893&vcrPlace=BOTTOM&pageType=NEWS_PAGE&vcrReferrer=https%3A%2F%2Fwww.infoq.com%2Fnews%2F2026%2F04%2Fyelp-cassandra-upgrade%2F)
  • #### Related Sponsor

![Image 12: Related sponsor icon/filters:no_upscale()/sponsorship/topic/8e5012e2-847d-4389-ac4d-ff70a961fc6e/NeuBirdLogo-1770640733556.png)](http://www.infoq.com/url/f/f9183a9e-f112-42f5-ab9e-75c2adb8fa43/)**Boost AWS effectiveness with Agentic AI — unify telemetry, reduce noise, and resolve incidents faster. Learn More.**

Related Content

Apr 24, 2026 ![Image 13: Icon image/presentations/deepfakes-ai/en/smallimage/shuman-ghosemajumder-thumbnail-1776248048343.jpeg)](http://www.infoq.com/presentations/deepfakes-ai/)

Apr 24, 2026 ![Image 14: Icon image/articles/orchestrating-agentic-multimodal-ai-pipelines-apache-camel/en/smallimage/orchestrating-agentic-multimodal-ai-pipelines-apache-camel-thumbnail-1776763980414.jpg)](http://www.infoq.com/articles/orchestrating-agentic-multimodal-ai-pipelines-apache-camel/)

Apr 24, 2026

Apr 23, 2026

Apr 23, 2026

Apr 23, 2026

Apr 23, 2026

Apr 23, 2026 ![Image 15: Icon image/presentations/exchange-systems-cloud/en/smallimage/frank-yu-thumbnail-1776173818222.jpeg)](http://www.infoq.com/presentations/exchange-systems-cloud/)

Apr 22, 2026

Related Sponsors

In this joint Microsoft and NeuBird AIi webinar, you’ll see how agentic AI is transforming Azure incident management from reactive firefighting to proactive, AI-driven resolution. Register Now.

  • Sponsored by

![Image 17: Icon image/filters:no_upscale()/sponsorship/topic/8e5012e2-847d-4389-ac4d-ff70a961fc6e/NeuBirdLogo-1770640733556.png)](http://www.infoq.com/url/f/f9183a9e-f112-42f5-ab9e-75c2adb8fa43/)

Related Content

Apr 22, 2026 ![Image 18: Icon image/presentations/panel-positive-culture/en/smallimage/ln-500x500-1775048593311.jpg)](http://www.infoq.com/presentations/panel-positive-culture/)

Apr 22, 2026

Apr 22, 2026 ![Image 19: Icon image/articles/sovereign-fault-domains-cloud-resilience/en/smallimage/sovereign-fault-domains-cloud-resilience-thumbnail-1776430533702.jpg)](http://www.infoq.com/articles/sovereign-fault-domains-cloud-resilience/)

Apr 22, 2026

Apr 21, 2026

Apr 21, 2026

**The InfoQ** Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

Enter your e-mail address

Select your country - [x] I consent to InfoQ.com handling my data as explained in this Privacy Notice.

We protect your privacy.

  • ##### [C++26: Reflection, Memory Safety, Contracts, and a New Async Model](http://www.infoq.com/news/2026/04/cpp-26-reflection-safety-async/ "C++26: Reflection, Memory Safety, Contracts, and a New Async Model")
  • ##### [From VR to Flat Screens: Bridging the Input and Immersion Gap](http://www.infoq.com/presentations/game-vr-flat-screens/ "From VR to Flat Screens: Bridging the Input and Immersion Gap")
  • ##### [Cursor 3 Introduces Agent-First Interface, Moving beyond the IDE Model](http://www.infoq.com/news/2026/04/cursor-3-agent-first-interface/ "Cursor 3 Introduces Agent-First Interface, Moving beyond the IDE Model")
  • ##### [How to Build an Exchange: Sub Millisecond Response Times and 24/7 Uptimes in the Cloud](http://www.infoq.com/presentations/exchange-systems-cloud/ "How to Build an Exchange: Sub Millisecond Response Times and 24/7 Uptimes in the Cloud")
  • ##### [Dropbox Collaborates with GitHub to Reduce Monorepo Size from 87GB to 20GB](http://www.infoq.com/news/2026/04/dropbox-reduces-git-optimization/ "Dropbox Collaborates with GitHub to Reduce Monorepo Size from 87GB to 20GB")
  • ##### [Cloudflare Outlines MCP Architecture as Enterprises Confront Security and Governance Risks](http://www.infoq.com/news/2026/04/cloudflare-mcp/ "Cloudflare Outlines MCP Architecture as Enterprises Confront Security and Governance Risks")
  • ##### [How Observability and Telemetry Can Enhance the Practice of Software Engineering](http://www.infoq.com/news/2026/04/observability-telemetry/ "How Observability and Telemetry Can Enhance the Practice of Software Engineering")
  • ##### [Panel: Building a Culture that Works](http://www.infoq.com/presentations/panel-positive-culture/ "Panel: Building a Culture that Works")
  • ##### [Platform as a Product: Delivering Value While Balancing Competing Priorities](http://www.infoq.com/news/2026/04/platform-product-deliver-value/ "Platform as a Product: Delivering Value While Balancing Competing Priorities")
  • ##### [Deepfakes, Disinformation, and AI Content Are Taking Over the Internet](http://www.infoq.com/presentations/deepfakes-ai/ "Deepfakes, Disinformation, and AI Content Are Taking Over the Internet")
  • ##### [Orchestrating Agentic and Multimodal AI Pipelines with Apache Camel](http://www.infoq.com/articles/orchestrating-agentic-multimodal-ai-pipelines-apache-camel/ "Orchestrating Agentic and Multimodal AI Pipelines with Apache Camel")
  • ##### [Dynamic Moments: Weaving LLMs into Deep Personalization at DoorDash](http://www.infoq.com/presentations/llm-personalization/ "Dynamic Moments: Weaving LLMs into Deep Personalization at DoorDash")
  • ##### [Yelp Achieves Zero-Downtime Upgrade of Over 1,000 Cassandra Nodes](http://www.infoq.com/news/2026/04/yelp-cassandra-upgrade/ "Yelp Achieves Zero-Downtime Upgrade of Over 1,000 Cassandra Nodes")
  • ##### [HashiCorp Vault 2.0 Marks Shift to IBM Lifecycle with New Identity Federation](http://www.infoq.com/news/2026/04/vault-2-0-ibm-identity/ "HashiCorp Vault 2.0 Marks Shift to IBM Lifecycle with New Identity Federation")
  • ##### [Grafana Rearchitects Loki with Kafka and Ships a CLI to Bring Observability Into Coding Agent](http://www.infoq.com/news/2026/04/grafana-loki-ai-agents/ "Grafana Rearchitects Loki with Kafka and Ships a CLI to Bring Observability Into Coding Agent")

**The InfoQ** Newsletter

A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example

  • Get a quick overview of content published on a variety of innovator and early adopter technologies
  • Learn what you don’t know that you don’t know
  • Stay up to date with the latest information from the topics you are interested in

Enter your e-mail address

Select your country - [x] I consent to InfoQ.com handling my data as explained in this Privacy Notice.

We protect your privacy.

**May 7 | June 10, 2026 | Online** Architecture decisions are hard to validate while shipping. Join a **5-week online cohort** for **senior engineers, architects, and team leads** to pressure-test real decisions, apply practical frameworks, and work through challenges with a confidential peer group. Facilitated by Luca Mezzalira, Principal Architect at AWS, this cohort helps you: * Pressure-test real decisions. * Apply frameworks to real problems. * Publish on InfoQ.com and earn your certification. **RESERVE YOUR PLACE**

[Home](http://www.infoq.com/ "Home")[Create account](http://www.infoq.com/reginit.action "Create account")Log In[QCon Conferences](http://qconferences.com/ "QCon Conferences")Events[Write for InfoQ](http://www.infoq.com/write-for-infoq/ "Write for InfoQ")[InfoQ Editors](http://www.infoq.com/infoq-editors/ "InfoQ Editors")[About InfoQ](http://www.infoq.com/about-infoq/ "About InfoQ")[About C4Media](https://c4media.com/ "About C4Media")[Media Kit](https://get.infoq.com/infoq-mediakit/ "Media Kit")[InfoQ Developer Marketing Blog](https://devmarketing.c4media.com/?utm_source=infoq "InfoQ Developer Marketing Blog")[Diversity](https://c4media.com/diversity "Diversity")

#### Events

May 7, 2026

June 1-2, 2026

June 10, 2026

November 16-20, 2026

#### Follow us on

Youtube 232K FollowersLinkedin 26K FollowersRSS 19K ReadersX 57.1k FollowersFacebook 21K LikesBluesky NewInstagram New

#### Stay in the know

The InfoQ Podcast![Image 20: The InfoQ Podcast Logo - Stay in the know](http://www.infoq.com/podcasts/)Engineering Culture Podcast![Image 21: Engineering Culture Podcast Logo - Stay in the knoww](http://www.infoq.com/podcasts/#engineering_culture)The Software Architects' Newsletter![Image 22: The Software Architects' Newsletter Logo - Stay in the know](http://www.infoq.com/software-architects-newsletter/)

General Feedback [feedback@infoq.com](mailto:feedback@infoq.com) Advertising [sales@infoq.com](mailto:sales@infoq.com) Editorial [editors@infoq.com](mailto:editors@infoq.com) Marketing [marketing@infoq.com](mailto:marketing@infoq.com)

InfoQ.com and all content copyright © 2006-2026 C4Media Inc.

Privacy Notice, Terms And Conditions, Cookie Policy

Close

[BT](http://www.infoq.com/int/bt/ "bt")