T
traeai
RSS登录
返回首页
Google Cloud Blog

Storage innovations to accelerate your AI workloads at Next ‘26

9.0Score
Storage innovations to accelerate your AI workloads at Next ‘26
AI 深度提炼
  • 推出高性能存储基础设施,提升10倍性能
  • 集成AI模型实现智能化数据管理
  • 增强生态系统兼容性与灵活性
#存储#AI#Google
打开原文

At Google Cloud Next, we are announcing innovations across every layer of our storage stacks — performance, intelligence, and management — to ensure your data is as fast and as useful as the AI models, apps and agents you are building.

**Why it matters:** Storage is no longer just a place to keep data. When training AI models, storage is the engine that feeds data-hungry accelerators. During AI inference, it’s the access layer that makes it responsive, acting as the source for the context that AI agents need to be effective. When storage performance falls short, accelerators sit idle, agents respond slowly, and data remains invisible to AI models.

But storage performance is only half the battle; you also need storage that’s smart. With the help of Google’s AI models integrated directly into the storage layer, you’re no longer just storing bits, but data that has full context about its content. In this new era of smart storage, raw data becomes a valuable asset that’s ready to use by a variety of downstream AI and enterprise applications.

**What’s new:**

  • **High-performance storage infrastructure:**New Rapid family of features in Cloud Storage for high-performance object storage; delivering 10x performance enhancements plus a new cost-effective Dynamic tier for Google Cloud Managed Lustre.
  • **Smart Storage:**Unlocking unstructured data with automated metadata annotation, and AI agent connectivity via MCP.
  • **Storage Intelligence:**Streamlined data management through zero-configuration dashboards, aggregated activity views, and enhanced batch operations.
  • **Enhanced ecosystem:** Expanded capabilities across Google Cloud NetApp Volumes, Filestore for GKE, and our backup and data protection portfolio.

Let’s take a deeper look at the storage enhancements we are unveiling this week.

Storage infrastructure that keeps up with AI

As AI models scale, getting data from the storage to the compute layer fast enough can be a bottleneck. New storage capabilities move performance directly into the storage layer, reducing total cost of ownership (TCO) and keeping accelerators fully utilized.

**Cloud Storage Rapid**Cloud-based object storage like our Cloud Storage is scalable and cost-effective, but bottlenecks can stall AI jobs and waste expensive compute cycles. Every time a training cluster waits on a read or a checkpoint write stalls, you're paying for accelerators that aren't doing useful work.

Cloud Storage Rapid marks a fundamental shift in designing AI infrastructure: you no longer have to choose between the reliability of object storage and the high performance of a specialized AI storage system. Cloud Storage Rapid lets you leverage the industry-leading durability, massive distributed scale, and cost-effective auto-tiering of object storage, while simultaneously achieving extreme throughput, frequent I/Os, and ultra-low latency. With native integrations into PyTorch and JAX, Cloud Storage Rapid is optimized out-of-box for the most popular AI/ML ecosystem frameworks, so that your data preparation, training, and inference workloads run on a high-performance and reliable foundation.

The Cloud Storage Rapid family consists of two offerings: **Rapid Bucket** and **Rapid Cache**.

  • Rapid Bucket,now generally available, leverages Colossus, the Google distributed storage system that powers Gemini and YouTube, to deliver more than **15 TB/s of bandwidth, 20 million requests per second, and sub-millisecond latency** in a single zonal bucket. With access via high-performance gRPC and S3-compatible APIs, Rapid Bucket increases accelerator utilization for multi-modal training with **50% reduced GPU blocked time**and 2.5x faster data loading. Checkpoint restores are 5x faster and checkpoint writes are 3.2x faster compared to traditional object storage, reducing workload interruptions and wasted GPU time.
Image 1: https://storage.googleapis.com/gweb-cloudblog-publish/images/1_x1nf9ws.max-1400x1400.png

Checkpoint writes are 3.2x faster and restores are 5x faster with Rapid Bucket

  • Rapid Cache, formerly Anywhere Cache, accelerates bandwidth for bursty workloads like model loading for inference, delivering an aggregate read throughput of 2.5 TB/s for existing buckets, with no code changes. The new ingest-on-write feature provides up to **2.2x faster checkpoint restores**, allowing training clusters to recover faster from interruptions. Rapid Cache’s combination of simplicity and performance has resulted in strong adoption, including from cutting-edge AI/ML customers like Thinking Machines Lab.“Rapid Cache has become a core foundation of our AI/ML data infrastructure, supporting our critical workflows, from data prep and pretraining to training and model loading. By acting as a crucial bandwidth shield and booster, it enables us to scale our data-intensive workloads across our entire fleet without compromise, providing us with the on-demand high bandwidth and consistent stability that we need to innovate at speed.” - James Sun, Member of Technical Staff, Thinking Machines Lab

**Google Cloud Managed Lustre**The Lustre parallel file system is the industry standard for organizations whose AI training and inference workloads require high throughput and sub-millisecond latency, and is trusted by AI labs and HPC centers worldwide to feed thousands of accelerators simultaneously and keep them saturated under pressure. Google Cloud Managed Lustre brings that capability as a fully managed service, and with today's announcements, it is the most performant managed Lustre offering available in any cloud.

Managed Lustre now delivers up to **10 TB/s of throughput** — a 10x increase since last year and 4–20x higher than managed Lustre offerings from other hyperscalers for a single instance. Powered by C4NX VMs and Hyperdisk Exapools, Managed Lustre writes and restores checkpoints 2.6x faster when compared to other Google Cloud storage solutions.

The new **Dynamic tier** ($0.06/GB-month) delivers the low-latency performance required for intense AI workloads like training and checkpointing. By serving data from persistent disk rather than relying on object-based caching, we eliminate a performance cliff — helping ensure your data remains responsive and your accelerators stay productive. A single SKU provides simple, predictable billing without the hidden complexity of traditional data tiering.

“By integrating Managed Lustre we eliminated the typical onboarding bottlenecks, allowing us to hit the ground running with the inferencing workload. This high-throughput, low-latency storage keeps our B200 GPUs fully saturated, driving a substantial performance gain in LLM inference over the H200. For our customers, this translates directly into faster, more responsive AI agents that can handle complex reasoning at a fraction of the previous latency.” - Lavnaya Karanam, Software Engineering PMTS, Salesforce

Smart Storage: Context for the AI era

The beauty of an object storage system like Cloud Storage has long been its simplicity: the system knows the object’s name, its size, and when it was created. But if you want to understand the object’s content — what entities it references, whether it contains sensitive PII, or whether it’s relevant to a pending query — you need to use custom pipelines, separate databases, and bespoke enrichment systems.

AI has changed the equation. To fine-tune a model, you need to select the right objects from the get-go, from a corpus of millions. Building an agent requires retrieving the right context for each decision. To meet a compliance obligation, you need to know what every file contains up front, before it becomes a liability. In each case, the bottleneck isn’t compute or model quality — it’s the inability to describe, find, and act on objects at scale.

To bridge that gap between stored and usable data, last year we introduced **Smart Storage,** a set of capabilities built directly into Cloud Storage that makes every object self-describing. New Smart Storage capabilities include:

  • **Automated annotations**, which eliminates the need to build and maintain custom annotation pipelines. With Smart Storage enabled, Cloud Storage can now automatically generate context — including image annotations — so your data is discoverable and usable from the moment it lands. You pay to annotate the data once at write time, and every downstream system can use those annotations immediately for the life of the object.
  • **Cloud Storage MCP server**lets you read, write, and analyze Cloud Storage data using the standard MCP protocol.

Smart Storage enables these capabilities, and others, thanks to its **object context,** now generally available. This metadata substrate adds structured, mutable, IAM-governed context to every object. Customers write their own tags and classifications; Google's annotation pipelines automatically attach labels, extracted entities, and compliance signals.

Image 2: https://storage.googleapis.com/gweb-cloudblog-publish/images/2_EJZvHi4.max-2000x2000.png

With Smart Storage, ML teams can select training datasets from semantic criteria without building retrieval pipelines. AI agents can ground their reasoning in enterprise data without a separate retrieval layer.

Storage Intelligence: Data management at AI scale

As data estates grow to hundreds of petabytes, storage costs can spike without warning, and security blind spots can multiply across billions of objects. To manage this, teams have to stitch together multiple tools just to answer basic questions about their own data.

Last year we launched **Storage Intelligence** to give enterprises a unified management experience built directly into Cloud Storage. Today, 70% of our largest customers use Storage Intelligence, each of whom manage over 50 billion objects.

Storage Intelligence provides a single view across your entire project or organization, with unique capabilities like bucket relocations across regions. Today, we're making it significantly more powerful with:

  • New **zero-configuration dashboards** instantly surface cost anomalies and integrate Security Command Center’s Data Security Posture Management (DSPM) data governance feature, to detect critical security vulnerabilities across Cloud Storage — no setup required.
  • New **object events and bucket activity** tables in Insights Datasets now drive deeper cost analysis and accelerate operational tasks. You can use these insights to perform a wide range of analyses, from optimizing bucket placement based on egress patterns to quickly troubleshooting 429 errors by finding the impacted objects.
  • **Enhanced batch operations** make it even simpler to act on billions of objects with new change ACL and storage class operations, and support for multi-bucket operations.

Enhancing the storage ecosystem

Beyond our core storage offerings, we are streamlining how enterprises migrate to and protect data in the cloud.

  • **Google Cloud NetApp Volumes:** With the launch of Flex Unified, NetApp Volumes now provides a unified enterprise storage platform that bridges the data center and the cloud, provisioning both block (iSCSI, NVMe/TCP) and file (NFS/SMB) on the same storage pool. New ONTAP-mode lets you bring your existing automation (Terraform, Ansible) and ONTAP APIs directly to NetApp Volumes.
  • **Filestore for GKE:** Developers building AI workloads on Google Kubernetes Engine (GKE) can start small, with shares as small as 100 GiB, and scale capacity and IOPS independently. At the same time, tighter integration to the Colossus distributed file system provides more scale and enterprise capabilities.
  • **Data protection:** Google Cloud Backup and DR now features agentic AI capabilities that can autonomously audit your backup estate and remediate coverage gaps, with new GA integrations for AlloyDB and Filestore.

Where to start

As you navigate today’s generational AI shift, you need a storage foundation to support ever-larger, more intelligent, and autonomous models. With new high-performance and intelligent storage layers, plus enhanced storage management tools and a deeper data protection bench, Google Cloud’s storage platforms lets you understand and use your enterprise data in ways that weren’t previously possible, allowing you to:

  • **Reduce the AI data bottleneck:**Saturate compute and accelerate ROI. Keep your expensive GPUs and TPUs fully productive with high-throughput storage that delivers the extreme performance required for large-scale training and inference.
  • **Build agent-ready data foundations:** Shift from building custom pipelines to an active knowledge base where self-describing objects let AI agents instantly reason over data without manual prep.
  • **Minimize blind spots across exabytes:** Replace fragmented management tools with zero-configuration dashboards and datasets to instantly surface cost anomalies and security risks across billions of objects.
  • **Embrace the storage ecosystem:** Streamline migration and protection. Bridge your data center to the cloud, scale containerized apps, and automate data resilience with agentic AI.

Visit the Google Cloud Storage console to explore these new features, read more about Cloud Storage Rapid, or explore our Next '26 storage sessions.

Posted in