Ok that's so cool

Paul Couvert(@itsPaulAi)

Paul Couvert(@itsPaulAi)2026年5月7日

Ok that's so cool

7.8内容质量

TL;DR · AI 摘要

多令牌预测技术使Gemma 4模型在本地运行速度提升1.5倍，达到138 tokens/s。

核心要点

Gemma 4使用MTP后，性能从97 tokens/s提升至138 tokens/s。
开源项目包括助手模型和代码，便于非技术人员安装使用。
研究的重要性在于通过相同硬件获得更高性能。

结构提纲

按章节快速跳转。

§多令牌预测技术
多令牌预测显著提升了Gemma 4的本地运行速度。
·性能对比
启用MTP前后的性能分别为97 tokens/s和138 tokens/s。
·开源贡献
所有相关代码和模型均已开源，方便社区使用。

思维导图

用一张图看清主题之间的关系。

查看大纲文本（无障碍 / 无 JS 友好）

多令牌预测技术
- 性能提升
  - 97 tokens/s
  - 138 tokens/s
- 开源项目
  - 助手模型
  - 代码

金句 / Highlights

值得收藏与分享的关键句。

Gemma 4使用MTP后，性能从97 tokens/s提升至138 tokens/s。
— 正文
⬇︎ 下载 PNG 𝕏 分享到 X
开源项目包括助手模型和代码，便于非技术人员安装使用。
— 正文
⬇︎ 下载 PNG 𝕏 分享到 X
研究的重要性在于通过相同硬件获得更高性能。
— 正文
⬇︎ 下载 PNG 𝕏 分享到 X

#Gemma 4#MTP#开源

打开原文

Multi-token prediction makes Gemma 4 run way faster locally!

Same model, same laptop, 1.5x faster.

Everything is open source from the assistant model to the code.

97 tokens/s without MTP
138 tokens/s with MTP

That's why research is so important. You're" / X

Paul Couvert on X: "Ok that's so cool Multi-token prediction makes Gemma 4 run way faster locally! Same model, same laptop, 1.5x faster. Everything is open source from the assistant model to the code. - 97 tokens/s without MTP - 138 tokens/s with MTP That's why research is so important. You're" / X

Don’t miss what’s happening

Paul Couvert

@itsPaulAi

Ok that's so cool Multi-token prediction makes Gemma 4 run way faster locally! Same model, same laptop, 1.5x faster. Everything is open source from the assistant model to the code. - 97 tokens/s without MTP - 138 tokens/s with MTP That's why research is so important. You're getting much more from the exact same machine and running the same powerful model. And making it available to non-technical folks just by installing an app is amazing.

Quote

atomic.chat

@atomic_chat_hq

·

16h

Multi-Token Prediction (MTP) for LLaMA.cpp! Running Gemma4 local model 1.5x faster. We patched LLaMA.cpp. Quantized Gemma 4 assistant models into GGUF format. We ran tests on a MacBook Pro M5Max. Gemma 26B with MTP drafts tokens 40% faster. Benchmarks, source code and models Image 4: 👇

Paid partnership

11:02 PM · May 7, 2026

·

4,748 Views

8

2

21

13

Read 8 replies