T
traeai
登录
返回首页
Google Developers Blog

Speeding Up AI: Bringing Google Colossus to PyTorch via GCSFS and Rapid Bucket

9.2Score
Speeding Up AI: Bringing Google Colossus to PyTorch via GCSFS and Rapid Bucket
AI 深度提炼
  • Rapid Bucket 通过 bi-di gRPC 替代 REST,将 Colossus 的低延迟(<1ms)和高吞吐(15+ TiB/s)带入 PyTorch 生态
  • gcsfs 新增自动识别 Rapid Bucket 类型能力,PyTorch 用户仅需保持原有 fsspec.open() 调用即可透明获益
  • Zonal co-location + direct connectivity + stateful streaming 三重优化消除跨区/连接/认证开销,GPU 利用率显著提升

结构提纲

AI 替你读一遍后整理出的核心层级。

  1. AI 模型扩大导致数据加载成为 GPU 瓶颈,传统 REST 存储无法满足吞吐与延迟需求。

  2. 基于 Colossus 的 zonal 对象存储,采用 bi-di gRPC 替代 HTTP,提供超低延迟与超高 QPS。

  3. 通过扩展 gcsfs 实现 Rapid Bucket 自动识别,PyTorch 生态无需 API 改动即可启用。

  4. 包含状态化 gRPC 流、直连 Colossus 文件、同 zone 部署及零配置迁移四层设计。

  5. 16 节点 128 GPU 分布式训练中,Rapid Bucket 相比标准 regional bucket 加速 23%。

思维导图

用一张图看清主题之间的关系。

正在生成思维导图…
查看大纲文本(无障碍 / 无 JS 友好)
  • PyTorch × GCS Rapid Bucket
    • 核心技术
      • Colossus 文件系统
      • bi-di gRPC 流
      • zonal direct connectivity
    • 集成方式
      • fsspec 接口标准
      • gcsfs 自动识别
      • 零代码迁移
    • 效果验证
      • 23% 训练加速
      • <1ms 随机读写
      • 15+ TiB/s 吞吐

金句 / Highlights

值得收藏与分享的关键句。

  • 通过 bypassing legacy REST APIs and utilizing persistent gRPC bidirectional streams,将 Colossus 文件系统能力直接注入 PyTorch 生态。

    第 3 段

    下载金句卡 PNG
  • Rapid Bucket 提供 <1ms 随机读/追加写延迟、15+ TiB/s 聚合吞吐、20M+ QPS —— 这是首次将 YouTube/搜索级存储性能开放给第三方 ML 框架。

    Key performance metrics

    下载金句卡 PNG
  • By adding bucket-type auto-detection to gcsfs, PyTorch and other fsspec clients transparently utilize Rapid with zero manual configuration.

    Under the hood 第 4 点

    下载金句卡 PNG
  • A dataset of 134M rows totaling around 451GB was loaded onto 16 GKE nodes... observed a performance gain of 23% using Rapid Bucket.
#PyTorch#GCS#Colossus#fsspec#AI Infrastructure
打开原文

APRIL 29, 2026

Today, we are announcing a major performance boost for AI/ML workloads using the PyTorch ecosystem on Google Cloud. By integrating Rapid Storage, powered by Google’s Colossus storage architecture, directly with**PyTorch** via the industry-standard `fsspec` interface, we are enabling researchers and developers to keep their GPUs busier than ever before.

**The challenge: Keeping GPUs fed**

As model sizes grow, data loading and checkpointing often become the primary bottlenecks in training. Data preparation activities to train models involve fetching and processing terabytes and petabytes of data from remote storage mechanisms like object storage. Standard REST-based storage access can struggle to meet the extreme throughput and low-latency requirements of modern distributed training, wasting valuable GPU resources.

**Rapid Bucket: Rapid Storage via bi-di gRPC**

Our new **Rapid Bucket** solution provides high-performance object storage in dedicated zonal buckets. By bypassing legacy REST APIs and utilizing persistent gRPC bidirectional streams, we’ve brought the power of Colossus, filesystem stateful protocols that power YouTube and Google Search, directly to the PyTorch ecosystem.

**Key performance metrics of Rapid Storage**

  • **Extreme Throughput:****15+ TiB/s** aggregate throughput.
  • **Ultra-Low Latency:**<1ms for random reads and append writes.
  • **High QPS:** Rapid Bucket provides 20M+ QPS.

**Fsspec - PyTorch’s Pythonic file interface**

`fsspec` is the pervasive Pythonic interface for file systems in the PyTorch ecosystem. It is already used for:

  • **Data preparation:** Dask, Pandas, Hugging Face Datasets, Ray Data
  • **Checkpoints:** PyTorch Lightning, Torch.dist, Weights & Biases
  • **Inference:** vLLM
Image 1: adk-java-1-0-release-1600x476

There are various backend implementations of fsspec for many different storage systems, which can all be integrated under a single layer, eliminating the need to write specific code for each backend. By integrating Rapid Storage with `gcsfs` (the Google Cloud Storage implementation of fsspec), developers can leverage speed gains provided by Rapid with a simple `fsspec.open()` call — no complex code rewrites required.

Under the hood: Leveraging Colossus

To achieve a performance boost with Rapid Buckets, we optimized the entire data path:

1. **Stateful grpc-based streaming:** gRPC bi-directional streaming keeps the connection alive, minimizing per-operation overhead like connection setup, auth, metadata etc., and enabling efficient, stateful data exchange for multiple reads or appends within a single object. 2. **Direct path:** Google Cloud Storage(GCS) Rapid Bucket uses direct connectivity for its gRPC bi-directional streaming APIs (BidiReadObject, BidiWriteObject) to achieve maximum performance by connecting clients directly to underlying Colossus files. Non-Rapid traffic to GCS would typically have more network hops than direct paths, making read/write latencies over Rapid significantly lower. For more details, see Rapid storage internal working. 3. **Zonal co-location:** By placing storage in the same zone as your compute (e.g., `us-central1-a`), we eliminate cross-zone latency. Prior to Rapid buckets, data in a regional bucket and compute(accelerators) can be in different zones and access the data induced latency. 4. **No-Op User Migration:** Preserved the existing `fsspec` API while entirely upgrading internal traffic from HTTP to BiDi-gRPC for Rapid buckets. By adding bucket-type auto-detection to gcsfs, PyTorch and other `fsspec` clients transparently utilize Rapid with zero manual configuration.

Results

A dataset of 134M rows totaling around 451GB was loaded onto 16 GKE nodes, each containing eight A4 GPUs. Training was conducted in 100 steps, with a checkpoint after every 25 steps using PyTorch Lightning. We benchmarked the performance of total training time, including the data load times, and we observed a **performance gain of 23% using Rapid Bucket compared with Standard regional bucket.**

Image 2: adk-java-1-0-release-1600x476

Microbenchmarking — that is, measuring the performance of a building block like I/O or resource usage — confirms these gains. Throughput improved by 4.8x for reads (both sequential and random) and 2.8x for writes. These tests used 16MB IO sizes across 48 processes. You can find more details at GCSFS-performance-benchmarks.

Get started

Getting started with GCSFS on Rapid Bucket is easy. Your existing code and scripts remain the same. You just need to change the bucket to a Rapid Bucket to take advantage of the performance boost.

**To install:**

Rapid Bucket integration is available from version 2026.3.0.

`pip install gcsfs` Python

Copied

**Code sample to read/write from GCS Rapid:**

import gcsfs

# Initialize the filesystem
fs = gcsfs.GCSFileSystem()

# Writing to a Rapid bucket
with fs.open('my-zonal-rapid-bucket/data/checkpoint.pt', 'wb') as f:
   f.write(b"model data...")

# Appending to an existing object (Native Rapid feature)
with fs.open('my-zonal-rapid-bucket/data/checkpoint.pt', 'ab') as f:
   f.write(b"appended data...")

Python

Copied

[](https://developers.googleblog.com/torchtpu-running-pytorch-natively-on-tpus-at-google-scale/) Previous

Next

[](https://developers.googleblog.com/developers-guide-to-building-adk-agents-with-skills/)

问问这篇内容

回答仅基于本篇材料
    0 / 500

    Skill 包

    领域模板,一键产出结构化笔记
    • 论文精读包

      把一篇论文 / 技术博客精读成结构化笔记:问题、方法、实验、批判、延伸阅读。

      • · TL;DR(1 段)
      • · 研究问题与动机
      • · 方法概览
    • 投融资雷达包

      把一条融资 / 创投新闻整理成投资人视角的雷达卡:交易要点、判断、竞争格局、风险、尽调清单。

      • · 交易要点(公司 / 轮次 / 金额 / 投资人 / 估值,材料未明示则写 “未披露”)
      • · 投资 thesis(这家公司为什么值得关注)
      • · 竞争格局与替代方案

    导出到第二大脑

    支持 Notion / Obsidian / Readwise
    下载 Markdown(Obsidian 直接拖入)