What’s new with compute: Scaling core and agentic workloads
- 提出流体计算架构,动态适应通用和代理工作负载需求。
- 通过 GKE 和 Agent Sandboxes 实现自动化编排和隔离执行环境。
- 适用于需要高并发、低延迟的复杂场景,如旅行应用动态定价。
At Google Cloud Next, we’re announcing a range of compute capabilities to enable your core general purpose and AI workloads for the agentic world with higher performance and lower costs.
**Why it matters:** IT leaders and builders are faced with balancing compute investments and resources between agentic AI and the general purpose use cases, including the web servers, databases, and enterprise applications that drive everyday customer experiences.
On one side, agents can place unpredictable demand on compute infrastructure, often scaling exponentially. A single user interaction can instantaneously kick off hundreds of concurrent, high-throughput, and low-latency tasks. On the other side, general-purpose workloads generate and hold the data required to fuel the agentic world. Relying on static and siloed infrastructure to run them can risk performance bottlenecks and spiraling costs, leaving your organization unable to respond to surges in demand.
Consider a global travel application where a simple vacation search instantly triggers a massive orchestration of agentic inventory checks, dynamic pricing models, and AI-driven personalized itineraries. Without a modern architecture, this sudden surge in demand can overwhelm the core booking database and bring business to a halt.
We address this with fluid compute, Google Cloud infrastructure that adapts to your general-purpose and agentic workflows, enabling both to win by flexing in performance, capacity, and scale, all in real time. This dynamic flexibility relies directly on the automated orchestration of Google Kubernetes Engine (GKE) and our new Agent Sandboxes to instantly provision secure, isolated execution environments at machine speed.
Let’s take a deeper look at the new compute capabilities announced at Next ‘26.
**Run AI and general purpose workloads together**
Agentic planning and reinforcement learning depend on highly fluid compute to process unpredictable bursts of autonomous tasks. Relying on static infrastructure to isolate agent-generated code can create severe provisioning delays and heavily inflate your cloud budget. You can remove these bottlenecks by adopting an adaptive cloud foundation. Leveraging GKE Agent Sandboxes empowers your teams to securely launch thousands of execution environments. Pairing these scalable sandboxes with efficient Google Axion processors helps your organization optimize total cost of ownership while fueling artificial intelligence innovation.
Here’s what’s new in Google Cloud compute launches and announcements:
- **Google Axion N4A is GA:** Harness the agility of Google’s custom Arm-based Axion CPUs and achieve up to 2x better price-performance than comparable current-generation x86-based VMs for cost-sensitive workloads such as Java applications, scale-out web servers, and SaaS built by startups, enterprises and partners. Learn more here.
- **GKE Agent Sandbox, with Axion N4A for price performance, is GA.** As the industry’s only native sandbox service among hyperscalers, GKE Agent Sandbox offers scalable and low-latency infrastructure designed for agents to safely execute untrusted code and tool calls without sacrificing performance. With Google Axion, you can build agents on leading infrastructure without compromising on cost or choice.GKE Agent Sandbox running on Google Axion N4A instances provides up to 30% better price-performance than the next leading hyperscale cloud provider. Try GKE Agent Sandbox here.
- **Google Axion C4A.metal, our first Axion bare metal instance, is in preview:** C4A.metal instances power Android development, automotive simulation, CI/CD pipelines, security workloads, and custom hypervisors, without the performance overhead and complexity of nested virtualization. C4A.metal will be GA this summer; learn more here**.**
- **C4 instances offer expanded support for Intel Xeon 6 (Granite Rapids) across all shapes:**Achieve high-performance for AI workloads like LLM inference and vector search by using Intel AMX with native FP16 support to increase throughput and reduce latency, offering 13% better price-performance versus comparable Intel Xeon 6-based VMs from another leading hyperscaler. C4 VMs are available with Intel Xeon 6 processors across all shapes. Learn more here.
- **Flexible CUDs****expanded support is GA:** Shift spend across regions and VM families while optimizing for TCO, with flexible committed use discounts, now with support for a wider range of VM families and services, including memory-optimized (M1-M4) and HPC-optimized (H3, H4D) VM families, as well as Cloud Run. Learn more here.
Here’s what customers are saying:
**Unity:**Unity is redefining the economics of real-time AI with Unity Vector. By migrating its on-demand feature processor workloads to Google Axion N4A instances, Unity achieved a 20% improvement in cost efficiency without sacrificing latency. As Unity Vector scales to meet increasing demand, the move to N4A instances ensures that Unity continues to deliver industry-leading performance at a sustainable cost.
**Deutsche Börse:**A leading German market infrastructure provider, Deutsche Börse migrated and modernized dozens of core financial applications onto Google Compute Engine VMs, including latest generation C4 and C4D instances, supporting latency-sensitive Oracle databases and post-trade processing at scale, and boosting release speed, operational agility, and resilience. This delivered the consistent performance they needed to process millions of financial transactions every day and they achieved 58% faster time to market and 33% lower TCO**.**
**WP Engine**: WP Engine powers millions of digital experiences where every millisecond matters. By running GKE clusters on C4D and N4D instances, WP Engine has seen up to a 60% reduction in latency for mobile-optimized REST APIs and up to 51% faster processing for data-rich application requests.
**eDreams ODIGEO:** Operating a high-volume, AI-powered travel platform where every millisecond dictates the customer experience, eDreams ODIGEO migrated its foundational Java-based ecommerce modules on GKE to Axion virtual machines. This immediately eliminated weeks of manual code optimization, delivered a massive 75% improvement in P95 latency with zero code changes, and unlocked price-performance to scale their global services far more cost-effectively than their legacy x86 infrastructure could.
**Chainguard:** Prioritizing absolute isolation for their foundational software build system, Chainguard deployed the new Axion C4A bare metal instances. This allowed them to establish a strong hypervisor security boundary for package builds, secure their development pipeline with architectural parity, and ensure robust protection, all without compromising build performance.
Run I/O and latency-sensitive workloads together
Both AI and core workloads depend on the ability to store, read, and move data as a single, high-performance operation. Traditionally, these stages are slowed by network and storage limits tethered to vCPU counts, which can starve AI models of the data they need to function. You can remove these constraints by leveraging accelerated Hyperdisk performance for rapid data access and high-performance networking for consistent transit. By allowing your data pipelines to scale independently of compute, your AI training and I/O-sensitive workloads have the dedicated bandwidth they need to remain stable under peak demand.
- **C4N is in preview:** Running high-volume network applications such as concurrent mobile app requests and real-time inventory updates can risk bottlenecks during peak traffic. Maximize your throughput with C4N, featuring Titanium adapters that offload complex packet processing to deliver a market-leading 95 million packets per second — a 40% performance advantage for high-traffic network applications compared to other leading hyperscalers. Designed to rapidly transfer large datasets, C4N provides nearly 400 Gbps of VM-to-VM bandwidth, a 4x improvement in bandwidth-per-vCPU, and achieves an 8x increase in egress network bandwidth through internet gateways compared to C4 VMs. C4N with Hyperdisk Extreme also provides the low-latency, high-speed data access that modern databases and enterprise AI applications need, with 25 GiB/s of block storage throughput and nearly 1M IOPS. Sign-up here for C4N preview access.
- **M4N is in preview:**Running memory-intensive databases can push organizations to overprovision compute cores (vCPU) to meet memory speeds, driving up software licensing fees. We introduced the new M4N series to solve this exact problem. Running Oracle workloads on M4N with Hyperdisk Extreme can reduce TCO by over 20%, enabling you to run Oracle more efficiently, with 26.57 GiB of RAM per vCPU for scale and on far fewer cores. Paired together, M4N with Hyperdisk Extreme delivers the highest per-core IOPS and throughput for high-memory instances across leading hyperscalers. Sign-up for the preview here.
- **Announcing Z4D:**Optimize I/O-intensive workloads and remove network-based storage bottlenecks with new Z4D instances. By securing up to 84 TiB of high-performance local SSD directly on the node, organizations can process massive datasets for SQL, NoSQL, and vector databases. Z4D provides up to 400 Gbps of VM-to-VM bandwidth, matching both C4N and M4N. Z4D virtual machines and bare metal instances will be in preview soon.
Here is what customers are saying:
**Ericsson:**5G Core workloads are inherently network-heavy, demanding high-throughput packet processing and deterministic latency that standard public cloud instances often struggle to maintain at scale. By leveraging the Google Cloud C4N, they’ve found the ideal choice for network performance to power Ericsson On-Demand. C4N’s architectural focus on network-optimized compute allows its 5G Core-as-a-Service to reach unprecedented throughput levels, like its recent 1 Tbps milestone, while maintaining the carrier-grade reliability its customers expect.
**Teradata:**Teradata’s Autonomous Knowledge Cloud enables the world’s largest enterprises to activate enterprise intelligence and turn trusted data into measurable business outcomes. Customers rely on Teradata to run mission‑critical, highly I/O‑intensive analytics at scale where performance and efficiency directly determine value. C4N instances are well suited for these demanding workloads, delivering strong price‑performance and supporting more efficient, optimized deployments. With C4N, Teradata can help customers accelerate insights, scale with confidence, and drive greater impact from their data and AI investments.
**Handle demanding storage requirements**
Foundational workloads such as web servers, applications, and databases hold the data required to fuel the agentic world. Siloing this critical information on rigid hardware creates bottlenecks that can completely stall enterprise modernization. Imagine a global retail brand running a holiday promotion, but the inventory database times out and drops customer requests because the legacy hardware couldn’t process the sudden flood of agentic queries.
Organizations require the highest performing database hosts backed by high performance IOPS and throughput per vCPU to ensure non-blocking data delivery. Moving these applications to modern cloud infrastructure dramatically improves total cost of ownership and operational throughput. Through strategic cloud migrations, customers can eliminate the architectural walls that stall modernization and unlock their data for AI. Here is what is new in fluid compute for throughput- and capacity-sensitive workloads:
- **Hyperdisk Balanced improvements.**Hyperdisk Balanced enables fast and efficient block storage for general purpose workloads, including applications and relational databases. With Hyperdisk Balanced you can drive up to 2.4 GiB/s and 160K IOPS per volume, higher than general-purpose block storage offerings from other hyperscalers, all while achieving lower mean latency than alternatives. With Hyperdisk Balanced High Availability you can now achieve a 4x performance improvement for high availability databases like SQL Server or PostgreSQL by dynamically routing full disk performance to the active VM, removing the need to overprovision storage. Leverage zero-downtime encryption key rotation and consistency groups for instant snapshots, making it easier to stay more secure. With these capabilities, you can deliver lower TCO, higher performance, and workload resilience for your general-purpose workloads. Learn more here.
- **Hyperdisk ML performance improvements and Hyperdisk Exapools are GA:** With 2 TiB/s of aggregate throughput (up from 1.2 TiB/s), Hyperdisk ML helps eliminate AI storage bottlenecks, offering more than 200x higher throughput per disk than competitive offerings, so your valuable accelerator clusters never sit idle. This allows you to maximize AI compute ROI while powering the next generation of intelligent agents. In addition, for large-scale training needs, Hyperdisk Exapools offer the highest aggregate block storage performance and capacity, per AI cluster, of any hyperscaler. Learn more about Hyperdisk ML and Exapools here and here.
- **Announcing Z4M:**Access up to 168 TiB of local SSD coupled with up to 400 Gbps of network bandwidth, support for RDMA, and bare-metal shapes to run distributed parallel file systems and large-scale AI/ML workloads. Z4M will be integrated with Cluster Director with the option to be colocated with accelerators to provide fast and low-latency access to data. Z4M VMs and bare metal instances are expected to be in preview in Q3 2026.
Here is what customers are saying:
**Shopify**: During Black Friday weekend sales, Shopify processed over $14.6B and tracked 136 million packages for 81M buyers using its Shop App built on Compute Engine’s Z-series backed storage — without compromising speed or reliability.
**HubX**: Operating a massive portfolio of AI-powered mobile applications where rapid model loading dictates the user experience, HubX deployed Hyperdisk ML on GKE to eliminate severe I/O bottlenecks. Leveraging this specialized storage layer allowed HubX to support hundreds of concurrent readers and accelerate pod initialization times by 30x during peak traffic surges, drastically reducing idle accelerator costs and helping ensure their complex inference workloads scaled as expected.
**Fluid infrastructure for the agentic era**
Now, your foundational workloads and agents no longer need to compete for capacity or performance. With Google Cloud’s fluid compute, you get adaptive cloud infrastructure that prevents bottlenecks and enables both your foundational and AI workloads to collaborate and thrive.
**Ready to get started?** Head straight to theGoogle Cloud console to spin up a VM for your next big project. Or start planning your migration by checking out Migration Center's AI-powered toolsets to perform cost estimates, create a business case, and evaluate your modernization options.
Posted in