# Speeding Up AI: Bringing Google Colossus to PyTorch via GCSFS and Rapid Bucket

Canonical URL: https://www.traeai.com/articles/4d2d6483-fe40-4931-ac0d-0385a07688df
Original source: https://developers.googleblog.com/speeding-up-ai-bringing-google-colossus-to-pytorch-via-gcsfs-and-rapid-bucket/
Source name: Google Developers Blog
Content type: article
Language: 中文
Score: 9.2
Reading time: 4 分钟
Published: 2026-04-30T02:48:34.840021+00:00
Tags: PyTorch, GCS, Colossus, fsspec, AI Infrastructure

## Summary

Google 将底层 Colossus 文件系统能力通过 gRPC 双向流与 fsspec/gcsfs 集成，使 PyTorch 在 GCS Rapid Bucket 上实现 23% 训练加速，无需修改代码。

## Key Takeaways

- Rapid Bucket 通过 bi-di gRPC 替代 REST，将 Colossus 的低延迟（<1ms）和高吞吐（15+ TiB/s）带入 PyTorch 生态
- gcsfs 新增自动识别 Rapid Bucket 类型能力，PyTorch 用户仅需保持原有 fsspec.open() 调用即可透明获益
- Zonal co-location + direct connectivity + stateful streaming 三重优化消除跨区/连接/认证开销，GPU 利用率显著提升

## Outline

- 背景与问题 — AI 模型扩大导致数据加载成为 GPU 瓶颈，传统 REST 存储无法满足吞吐与延迟需求。
  - Rapid Bucket 架构 — 基于 Colossus 的 zonal 对象存储，采用 bi-di gRPC 替代 HTTP，提供超低延迟与超高 QPS。
  - fsspec 无缝集成 — 通过扩展 gcsfs 实现 Rapid Bucket 自动识别，PyTorch 生态无需 API 改动即可启用。
  - 性能优化机制 — 包含状态化 gRPC 流、直连 Colossus 文件、同 zone 部署及零配置迁移四层设计。
  - 实测效果 — 16 节点 128 GPU 分布式训练中，Rapid Bucket 相比标准 regional bucket 加速 23%。

## Highlights

- > 通过 bypassing legacy REST APIs and utilizing persistent gRPC bidirectional streams，将 Colossus 文件系统能力直接注入 PyTorch 生态。 — 第 3 段
- > Rapid Bucket 提供 <1ms 随机读/追加写延迟、15+ TiB/s 聚合吞吐、20M+ QPS —— 这是首次将 YouTube/搜索级存储性能开放给第三方 ML 框架。 — Key performance metrics
- > By adding bucket-type auto-detection to gcsfs, PyTorch and other fsspec clients transparently utilize Rapid with zero manual configuration. — Under the hood 第 4 点
- > A dataset of 134M rows totaling around 451GB was loaded onto 16 GKE nodes... observed a performance gain of 23% using Rapid Bucket. — Results

## Citation Guidance

When citing this item, prefer the canonical traeai article URL for the AI-readable summary and include the original source URL when discussing the underlying source material.