T
traeai
登录
返回首页
Paul Couvert(@itsPaulAi)

Ok that's so cool

7.8Score
Ok that's so cool

TL;DR · AI 摘要

多令牌预测技术使Gemma 4模型在本地运行速度提升1.5倍,达到138 tokens/s。

核心要点

  • Gemma 4使用MTP后,性能从97 tokens/s提升至138 tokens/s。
  • 开源项目包括助手模型和代码,便于非技术人员安装使用。
  • 研究的重要性在于通过相同硬件获得更高性能。

结构提纲

按章节快速跳转。

  1. 多令牌预测显著提升了Gemma 4的本地运行速度。

  2. 启用MTP前后的性能分别为97 tokens/s和138 tokens/s。

  3. 所有相关代码和模型均已开源,方便社区使用。

思维导图

用一张图看清主题之间的关系。

查看大纲文本(无障碍 / 无 JS 友好)
  • 多令牌预测技术
    • 性能提升
      • 97 tokens/s
      • 138 tokens/s
    • 开源项目
      • 助手模型
      • 代码

金句 / Highlights

值得收藏与分享的关键句。

#Gemma 4#MTP#开源
打开原文

Multi-token prediction makes Gemma 4 run way faster locally!

Same model, same laptop, 1.5x faster.

Everything is open source from the assistant model to the code.

  • 97 tokens/s without MTP
  • 138 tokens/s with MTP

That's why research is so important. You're" / X

Paul Couvert on X: "Ok that's so cool Multi-token prediction makes Gemma 4 run way faster locally! Same model, same laptop, 1.5x faster. Everything is open source from the assistant model to the code. - 97 tokens/s without MTP - 138 tokens/s with MTP That's why research is so important. You're" / X

Don’t miss what’s happening

Image 2

Paul Couvert

@itsPaulAi

Ok that's so cool Multi-token prediction makes Gemma 4 run way faster locally! Same model, same laptop, 1.5x faster. Everything is open source from the assistant model to the code. - 97 tokens/s without MTP - 138 tokens/s with MTP That's why research is so important. You're getting much more from the exact same machine and running the same powerful model. And making it available to non-technical folks just by installing an app is amazing.

Quote

Image 3

atomic.chat

@atomic_chat_hq

·

16h

Multi-Token Prediction (MTP) for LLaMA.cpp! Running Gemma4 local model 1.5x faster. We patched LLaMA.cpp. Quantized Gemma 4 assistant models into GGUF format. We ran tests on a MacBook Pro M5Max. Gemma 26B with MTP drafts tokens 40% faster. Benchmarks, source code and models Image 4: 👇

Image 5

Paid partnership

11:02 PM · May 7, 2026

·

4,748 Views

8

2

21

13

Read 8 replies

AI 可能会生成不准确的信息,请核实重要内容