AlloyDB 热备：更快故障转移，稳定性能表现

Google Cloud Blog

Google Cloud Blog2026年5月29日

AlloyDB Hot Standby: Faster Failovers, Consistent Performance

9.2Score

TL;DR · AI Summary

AlloyDB Hot Standby reduces failover time from minutes to ~15 seconds and eliminates performance degradation from cold cache warm-up—all at zero additional cost; the standby node continuously applies WAL logs, enabling near-real-time synchronization with the primary.

Key Takeaways

Hot Standby cuts failover time to ~15 seconds (vs. minutes previously), signific
The standby node continuously replays WAL logs, keeping caches ‘warm’, so TPS re
Enabled by default on new PostgreSQL 18 instances, rolling out to older versions

Outline

Jump quickly between sections.

§Evolution of AlloyDB HA Architecture
In traditional HA, the standby node is idle and must start the database and replay logs upon failure, causing prolonged recovery and performance drop.
·Hot Standby Core Mechanism
The standby node continuously streams and applies WAL from the primary, keeping PostgreSQL running and caches warm.
·Performance & Reliability Validation
Benchmark shows Hot Standby completes failover in ~15 seconds with instant TPS recovery; legacy HA takes minutes and suffers slow TPS ramp-up.
§Deployment & Compatibility
Hot Standby is enabled by default on new PostgreSQL 18 instances and will roll out to earlier versions—free of charge.

Mindmap

See how the topics connect at a glance.

查看大纲文本（无障碍 / 无 JS 友好）

AlloyDB Hot Standby 高可用升级
- 核心机制
  - 备用节点持续流式应用WAL
  - PostgreSQL进程常驻运行
  - 缓存保持热态（buffer cache等）
- 关键收益
  - Failover时间≈15秒（原数分钟）
  - TPS瞬时恢复，无性能brownout
  - RTO显著优化，SLA更可靠
- 部署与成本
  - 默认启用于PG 18新实例
  - 逐步覆盖旧版本
  - 零额外成本

Highlights

Key sentences worth saving and sharing.

Hot Standby instances complete failover in ~15 seconds, with TPS recovering almost instantly; legacy HA takes longer and requires several minutes to regain original TPS as caches warm up.
— See Hot Standby in Action
⬇︎ 下载 PNG 𝕏 分享到 X
By continuously applying WAL records, the hot standby keeps memory caches (e.g., PostgreSQL buffer cache) warm, avoiding post-failover performance 'brownouts' caused by disk-based cache warming.
— Introducing AlloyDB Hot Standby
⬇︎ 下载 PNG 𝕏 分享到 X
This enhancement is provided at no additional cost—no configuration changes or fees required—immediately improving RTO and service continuity.
— Introducing AlloyDB Hot Standby
⬇︎ 下载 PNG 𝕏 分享到 X

#AlloyDB#PostgreSQL#High Availability#Failover#Google Cloud

Open original article

AlloyDB for PostgreSQL is a fully managed, PostgreSQL-compatible database service designed for the most demanding enterprise workloads. It combines the best of PostgreSQL with the power of Google, delivering exceptional performance, scalability, and availability. We are continuously innovating to make AlloyDB even more resilient, and today, we're excited to announce a significant upgrade to our High Availability (HA) architecture: Hot Standby.

Understanding AlloyDB HA Architecture

Image 1: https://storage.googleapis.com/gweb-cloudblog-publish/images/1_SeSBztp.max-1100x1100.png

An AlloyDB primary instance configured for high availability consists of an active node and a standby node, located in different zones within a region for resilience. AlloyDB's cloud-native architecture separates compute and storage to allow for individual scaling of each resource. Database write-ahead logs (WAL) are synchronously written to a regional log persistor, ensuring durability, while data blocks reside in AlloyDB's regional storage service. A load balancer directs traffic to the current active node using a stable IP address.

In the traditional HA model, if the active node became unavailable, AlloyDB would automatically initiate a failover. The standby node, previously idle from a PostgreSQL perspective, would start the database, process any remaining logs, and then take over. While this ensures high availability, the database startup time and the subsequent cache warming period could impact application recovery time and performance.

Introducing AlloyDB Hot Standby: The New Architecture

Image 2: https://storage.googleapis.com/gweb-cloudblog-publish/images/2_EYWferi.max-1000x1000.png

With the new Hot Standby capability, we've transformed the role of the standby node. Instead of being a passive node, the standby node now continuously applies WAL records streamed from the primary. This architectural shift brings two massive advantages:

Dramatically Reduced Failover Times: Because PostgreSQL is already running, initialized, and actively replicating on the standby, the time required to promote it to primary in the event of a failure is significantly shorter. The system detects the failure (typically within 30 seconds), promotes the standby, and redirects connections. The database startup phase on the standby is eliminated, reducing overall downtime and improving your Recovery Time Objective (RTO).

Consistent Performance After Failover: Since the Hot Standby node is actively replaying logs, its memory caches (like the PostgreSQL buffer cache) are kept "warm." They contain much of the same frequently accessed data as the primary node's caches. When a failover occurs, the new primary can serve requests at optimal speed almost immediately. This avoids the performance "brownout" typically seen while caches warm up from disk, ensuring application performance remains stable.

And the best part? This substantial enhancement to availability and resilience comes at no additional cost to you.

See Hot Standby in Action

We've prepared a short demonstration to illustrate the difference between the new Hot Standby HA and the legacy HA setup. In the video, we run a benchmark load on two AlloyDB instances and trigger a failover on both simultaneously.

Image 3: https://storage.googleapis.com/gweb-cloudblog-publish/original_images/AlloyDB_Hot_Standby_Final_Video_v1_-_GIF.gif

As you can see in the demo:

The instance with Hot Standby completes the failover in approximately 15 seconds. Crucially, its transaction per second (TPS) rate returns to the pre-failover levels almost immediately.

The instance with Legacy HA takes noticeably longer to complete the failover. Even when it comes back online, the TPS is significantly lower and takes several minutes to ramp back up to the original performance levels as its caches warm up.

This side-by-side comparison clearly shows the benefits of Hot Standby in minimizing downtime and eliminating the post-failover performance impact.

Get Started with Enhanced HA

Hot Standby is being rolled out to newly created AlloyDB instances in PostgreSQL 18, providing an upgraded HA experience automatically, and will be rolling out to the earlier major versions in the coming months. You can continue to rely on AlloyDB's 99.99% SLA, now backed by even faster failovers and more predictable post-failover performance.

This enhancement underscores our commitment to providing a best-in-class, enterprise-grade managed PostgreSQL experience.

To learn more about AlloyDB's High Availability features, please refer to theofficial documentation. New to AlloyDB?Try it out today!

Posted in

Databases