---
title: "TPU 8i is co-designed with our Gemini research team to support low latency inference.  Among the att..."
source_name: "Jeff Dean(@JeffDean)"
original_url: "https://x.com/JeffDean/status/2047407537566495033"
canonical_url: "https://www.traeai.com/articles/01d893d7-eb53-409a-88be-97a75ebafdf1"
content_type: "tweet"
language: "中文"
score: 5
tags: []
published_at: "2026-04-23T20:09:30+00:00"
created_at: "2026-04-23T23:05:51.685089+00:00"
---

# TPU 8i is co-designed with our Gemini research team to support low latency inference.  Among the att...

Canonical URL: https://www.traeai.com/articles/01d893d7-eb53-409a-88be-97a75ebafdf1
Original source: https://x.com/JeffDean/status/2047407537566495033

## Summary

traeai 为开发者、研究员和内容团队筛选高质量 AI 技术内容，提供摘要、评分、趋势雷达与一键内容产出。

## Key Takeaways

- 
- 
- 

## Content

Title: Jeff Dean on X: "TPU 8i is co-designed with our Gemini research team to support low latency inference.  Among the attributes that support this are large amounts of on-chip SRAM, enabling more computations to be done on chip without having to go to HBM for weights or KVCache state as often.

The https://t.co/sQw19DE2iZ" / X

URL Source: http://x.com/JeffDean/status/2047407537566495033

Markdown Content:
## Post

## Conversation

TPU 8i is co-designed with our Gemini research team to support low latency inference. Among the attributes that support this are large amounts of on-chip SRAM, enabling more computations to be done on chip without having to go to HBM for weights or KVCache state as often. The boardfly network topology (see image below) offers a much lower diameter network to connect all 1152 chips in an 8i pod, by fully connecting all 4 chips on the board together, fully connecting groups of 8 boards together, and then fully connecting 36 groups of 8 boards together. In addition, there is specialized Collectives Acceleration Engine (CAE) circuitry on each chip to offload various kinds of reductions and other global operations from the main computational portion of each chip, reducing on-chip latency by up to 5x. Together, these features will offer very high throughput for large-scale models (including MoEs, which often require mapping onto many chips for inference), and will do so at very low latency. This will make agentic workloads and interactive usage really shine on the TPU 8i.

[![Image 1: Image](https://pbs.twimg.com/media/HGnYsj4akAEw9sk?format=jpg&name=small)](https://x.com/JeffDean/status/2047407537566495033/photo/1)