T
traeai
Sign in

模型

什么是 Gemma 4

也叫:gemma4

Google DeepMind 的 Gemma 系列模型,用于高质量文本生成。

为什么现在值得关注?

最近变化

2026-06-10 · DiffusionGemma 在 NVIDIA H100 上每秒生成 1000+ tokens,速度比传统模型快 4 倍。

Gemma 4 被反复提及时,通常意味着它正在影响产品路线、开发者工作流或 AI 产业判断。这个页面把分散材料合并成一个可持续更新的观察入口。

📰 Gemma 4 最新动态

已收录 28 篇与「Gemma 4」相关的 AI 资讯和分析。

Google AI Studio 3.0 (Fully Free): This is ACTUALLY AWESOME!

Google AI Studio 3.0 (Fully Free): This is ACTUALLY AWESOME!

AICodeKing979 字 (约 4 分钟)
87

Google AI Studio 3.0 launches fully free with integrated Gemma 4 model and multimodal capabilities, enabling real-time inference, custom model deployment, and API access, significantly lowering the barrier for developers.

入选理由:Gemma 4 模型在 Google AI Studio 3.0 中完全免费,支持 128K 上下文长度。

FeaturedVideo#Google AI Studio#Gemma 4#AI development tool#free AI platform中文
Google DeepMind Blog 图标

DiffusionGemma: 4x faster text generation

Google DeepMind Blog1006 字 (约 5 分钟)
85

DiffusionGemma 模型通过并行生成文本块,实现高达 4 倍的文本生成速度,适用于需要高速处理的本地交互场景。

入选理由:DiffusionGemma 在 NVIDIA H100 上每秒生成 1000+ tokens,速度比传统模型快 4 倍。

FeaturedArticle#DiffusionGemma#文本生成#AI模型#Google DeepMind英文
Hugging Face Blog 图标

Introducing North Mini Code: Cohere’s First Model For Developers

Hugging Face Blog2871 字 (约 12 分钟)
85

Cohere 发布 North Mini Code,一个 30B 参数的 Mixture-of-Experts 模型,专为开发者设计,在多个代码生成基准测试中表现优异。

入选理由:North Mini Code 是 Cohere 首个专为开发者设计的模型,参数量为 30B,其中 3B 为活跃参数。

FeaturedArticle#Cohere#模型#代码生成#Mixture-of-Experts#AI英文
Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

The Keyword (blog.google)766 字 (约 4 分钟)
85

Google releases Gemma 4 QAT models with quantization-aware training, achieving 1GB memory footprint for E2B model.

入选理由:QAT技术使Gemma 4 E2B模型内存占用降至1GB

FeaturedArticle#Model Compression#Quantization Training#Mobile Optimization英文
Building a Multi-Tool Gemma 4 Agent with Error Recovery

Building a Multi-Tool Gemma 4 Agent with Error Recovery

Machine Learning Mastery3497 字 (约 14 分钟)
85

通过构建一个具有错误恢复机制的多工具 Gemma 4 代理,学习如何优雅地处理工具调用中的失败。

入选理由:迭代代理循环需设置最大迭代次数以防止无限循环。

FeaturedArticle#Gemma 4#工具调用#错误恢复#迭代代理英文
Hugging Face Blog 图标

Reachy Mini goes fully local

Hugging Face Blog1966 字 (约 8 分钟)
85

Reachy Mini now runs its voice backend locally, eliminating the need for cloud servers.

入选理由:部署本地语音后端于 Reachy Mini 上。

FeaturedArticle#Reachy Mini#Voice Backend#Local Service中文
Easy Agentic Tool Calling with Gemma 4

Easy Agentic Tool Calling with Gemma 4

KDnuggets2859 字 (约 12 分钟)
85

Gemma 4 enables true agentic behavior through local sandboxed tools like filesystem exploration and restricted Python execution.

入选理由:Gemma 4 支持本地工具调用,如文件系统探索和受限 Python 执行,增强模型自主性

FeaturedArticle#Gemma 4#Agent#Tool Calling#Security#Python英文
TLMs: Tiny LLMs and Agents on Edge Devices with @cormacb 

https://t.co/u0fHD7j5kZ

Function Gemma s...

本文介绍了Tiny LLMs和Agents在边缘设备上的应用,特别是Function Gemma模型在Pixel 7上的性能表现,以及开发者在设备上实现AI的两种路径:基于Gemma 4的技能框架和Eloquent生产转录应用。

入选理由:Function Gemma模型在Pixel 7上以270M参数运行,预填处理速度达到近2000 token/秒,出厂时在固定应用意图上准确率达到46%。

FeaturedTweet#Tiny LLMs#Edge Devices#Function Gemma#AI on Devices#Machine Learning中文
Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attention

Recent developments in LLM architectures focus on KV sharing, mHC, and compressed attention to improve long-context efficiency.

入选理由:Gemma 4引入KV共享和每层嵌入,优化内存使用。

FeaturedArticle#LLM#Architecture Optimization#Attention Mechanism英文
A Smarter Google AI Edge Gallery: MCP integration, notifications, and session continuity

A Smarter Google AI Edge Gallery: MCP integration, notifications, and session continuity

Google Developers Blog1169 字 (约 5 分钟)
80

Google AI Edge Gallery introduces three major capabilities: MCP protocol support for cross-data-source tool calling, local notification scheduling for proactive interactions, and persistent chat history, shifting mobile Agent development from reactive to automated and continuous experiences.

入选理由:通过注册MCP URL,应用可将工具定义动态导入本地模型系统提示词,推理完全在手机端完成,请求由MCP服务器执行

FeaturedArticle#Google AI Edge Gallery#MCP#On-device AI#Gemma 4#Mobile Agent英文
Ok that's so cool

Multi-token prediction makes Gemma 4 run way faster locally!

Same model, same la...

Ok that's so cool

Paul Couvert(@itsPaulAi)281 字 (约 2 分钟)
78

Multi-token prediction technology makes Gemma 4 run 1.5 times faster locally, reaching 138 tokens/s.

入选理由:Gemma 4使用MTP后,性能从97 tokens/s提升至138 tokens/s。

FeaturedTweet#Gemma 4#MTP#Open Source中文
We released Gemma 4 12B yesterday. Here is a visual guide that explains the full architecture.

→ Ho...

Gemma 4 12B Released: Visual Guide to Native Multimodal Architecture

Philipp Schmid(@_philschmid)169 字 (约 1 分钟)
75

Gemma 4 12B achieves native multimodal processing for text, images, and audio by removing separate vision and audio encoders. This architecture replaces traditional encoder-patching approaches with joint representation learning, reducing inference latency and improving edge deployment efficiency.

入选理由:Gemma 4 12B移除独立视觉/音频编码器,采用原生多模态统一架构

FeaturedTweet#Gemma 4#Multimodal LLM#Native Multimodality#Edge AI英文
Gemma 4 Multi-Token Prediction Delivers Up to ~3x Faster Token Generation

Gemma 4 introduces multi-token prediction technology, achieving up to 3x faster token generation, significantly improving large model inference efficiency.

入选理由:Gemma 4 采用多令牌预测技术,将令牌生成速度提升至原来的 3 倍。

FeaturedArticle#AI#LLM#Gemma#Transformer#Token Generation英文
AI on Android: Ask me Anything — Florina Muntenescu & Oli Gaymond, Google DeepMind

Android developers can build intelligent experiences through three approaches: pure on-device models, hybrid mode (on-device first with cloud fallback), or pure cloud inference, where Gemini Nano serves as the most efficient on-device model managed through AI Core system service supporting both ML Kit GenAI API and Light Art LM implementations.

入选理由:Android支持三种AI部署模式:纯设备端、混合模式、纯云端推理

FeaturedVideo#Android#AI#Gemini Nano#ML Kit#On-device AI英文
Google Developers Blog 图标

All the news from the Google I/O 2026 Developer keynote

Google Developers Blog818 字 (约 4 分钟)
75

Google announced a transition from assistive AI to autonomous agents at I/O 2026, highlighting the Gemini 3.5 model series, upgraded Antigravity 2.0 agent-first platform, and new tools including Android CLI, Android Bench, and WebMCP to help developers build high-quality applications.

入选理由:Google 推出 Gemini 3.5 系列模型并升级 Antigravity 2.0 平台,支持跨平台终端沙箱、凭证掩码和强化 Git 策略的子代理编排

FeaturedArticle#Google I/O#AI Agent#Android#Gemini#Web Development英文
Blazing fast on-device GenAI with LiteRT-LM

Blazing fast on-device GenAI with LiteRT-LM

Google Developers Blog1574 字 (约 7 分钟)
75

Google AI Edge introduces LiteRT-LM, an optimized inference engine for deploying Gemma 4 models on edge devices, supporting Android, iOS, and web platforms with GPU inference reaching 76 tokens/sec and Multi-Token Prediction delivering up to 2.2x speedup.

入选理由:LiteRT-LM 在 Android GPU (OpenCL) 上实现 52 tokens/sec 解码速度,iOS (Metal) 达 56 tokens/sec,WebGPU 在 MacBook Pro 上可达 76 tokens/sec

FeaturedArticle#Google AI Edge#LiteRT-LM#Gemma 4#Edge AI#On-device Inference英文
New @GoogleGemma 4 QAT (Quantization-Aware Training) checkpoints are here, so you can run models loc...

Google releases Gemma 4 QAT checkpoints, enabling local inference on consumer GPUs and mobile devices with Q4_0 GGUF format, keeping memory below 1GB while preserving high inference quality.

入选理由:Gemma 4 QAT 检查点采用 Q4_0 GGUF 格式,兼容所有尺寸模型,提升本地推理性能。

FeaturedTweet#Gemma#QAT#GGUF#mobile inference#quantization中文
MTP drafters for Gemma 4 are available today under the same open-source Apache 2.0 license. Read the...

Google has released MTP drafters for Gemma 4 under the Apache 2.0 open-source license, available for download from Kaggle and Hugging Face.

入选理由:Gemma 4的MTP drafters现已发布,使用Apache 2.0开源许可。

FeaturedTweet#Gemma 4#MTP drafters#open source英文
ollama(@ollama) 图标

Model page: https://t.co/WD3DDuxEhx

ollama(@ollama)57 字 (约 1 分钟)
60

文章介绍了Gemma 4模型的性能和适用场景,但信息密度较低,缺乏深度分析。

入选理由:Gemma 4模型适用于推理、代理工作流、编码和多模态理解。

FeaturedTweet#Gemma#AI模型英文
Here’s this week’s shipping recap 👇

— Nano Banana 2 & Nano Banana Pro are now GA and available via...

Here’s this week’s shipping recap 👇

Google AI(@GoogleAI)190 字 (约 1 分钟)
60

Google AI has released new tools such as Nano Banana 2, Nano Banana Pro, and Co-Scientist, but the information density is low and lacks in-depth technical details.

入选理由:Nano Banana 2 和 Nano Banana Pro 已经 GA,可通过 Gemini 平台使用。

FeaturedTweet#Google AI#Gemini#AI Model英文
.@GoogleDeepMind's Gemma 4 - 12B is available on Ollama!  

Chat: 
ollama run gemma4:12b-mlx

Hermes...

@GoogleDeepMind's Gemma 4 - 12B is available on Ollama!

ollama(@ollama)104 字 (约 1 分钟)
60

ollama announces that the Gemma 4 - 12B model is now available on its platform. Users can run the model via MLX, and it supports tools like Hermes Agent and Claude Code.

入选理由:ollama 宣布 Gemma 4 - 12B 模型已在其平台上可用。

FeaturedTweet#ollama#Gemma 4#MLX中文
End-of-week call for community builds!

Have a project or demo that showcases Gemma 4 Multi-Token Pr...

End-of-week call for community builds!

Google AI Developers(@googleaidevs)163 字 (约 1 分钟)
45

Google AI invites developers to showcase projects on Gemma 4 MTP.

入选理由:Google AI邀请开发者分享Gemma 4 MTP项目

FeaturedTweet#Google AI#Developer Community中文
Accelerating Gemma 4: faster inference with  multi-token prediction drafters

Accelerating Gemma 4: faster inference with multi-token prediction drafters

The Keyword (blog.google)1732 字 (约 7 分钟)
45

The article briefly mentions that Gemma 4 uses multi-token prediction to accelerate inference but provides no technical details, experimental data, or implementation methods, making it a promotional lightweight announcement with little engineering value.

入选理由:Gemma 4通过多标记预测(MTP)加速推理,速度提升最高达3倍。

FeaturedArticle#Gemma#multi-token prediction#inference optimization#Google DeepMind英文

与「Gemma 4」经常一起出现的 AI 术语。

💡 想追踪「Gemma 4」的长期趋势?去 实体雷达 · Gemma 4 查看详细分析和跨材料问答。

AI may generate inaccurate information. Please verify important content.