TLMs: Tiny LLMs and Agents on Edge Devices with @cormacb 

https://t.co/u0fHD7j5kZ

Function Gemma s...

AI Engineer(@aiDotEngineer)

AI Engineer(@aiDotEngineer)2026年5月21日

TLMs: Tiny LLMs and Agents on Edge Devices with @cormacb https://t.co/u0fHD7j5kZ Function Gemma s...

8.5内容质量

TL;DR · AI 摘要

本文介绍了Tiny LLMs和Agents在边缘设备上的应用，特别是Function Gemma模型在Pixel 7上的性能表现，以及开发者在设备上实现AI的两种路径：基于Gemma 4的技能框架和Eloquent生产转录应用。

核心要点

Function Gemma模型在Pixel 7上以270M参数运行，预填处理速度达到近2000 token/秒，出厂时在固定应用意图上准确率达到46%。
通过在合成生成的数据集上进行微调，准确率在十个功能中的八个上超过90%。
开发者有两条路径在设备上实现AI：一是使用基于Gemma 4的技能框架，二是将两个亚十亿参数模型链在一起，如Eloquent转录应用。

结构提纲

按章节快速跳转。

§介绍
概述Tiny LLMs和Agents在边缘设备上的应用，以及本文将要讨论的内容。
§Function Gemma模型
详细介绍Function Gemma模型的参数规模、在Pixel 7上的性能表现以及出厂时的准确率。
·微调效果
描述在合成生成数据集上进行微调后，模型在应用意图上的准确率显著提升。
§开发者路径
介绍开发者在设备上实现AI的两种主要路径：技能框架和模型链式应用。
·技能框架
详细说明基于Gemma 4的技能框架，包括一个在设备上运行的餐厅轮盘演示。
·Eloquent应用
介绍Eloquent生产转录应用，该应用通过将两个亚十亿参数模型链在一起实现。

思维导图

用一张图看清主题之间的关系。

查看大纲文本（无障碍 / 无 JS 友好）

Tiny LLMs and Agents on Edge Devices

金句 / Highlights

值得收藏与分享的关键句。

Function Gemma ships at 270 million parameters and runs nearly 2,000 tokens per second prefill on a Pixel 7.
— 第一段
⬇︎ 下载 PNG 𝕏 分享到 X
Out of the box, it hits 46% accuracy on a fixed set of app intents.
— 第一段
⬇︎ 下载 PNG 𝕏 分享到 X
Fine tune on a synthetically generated dataset and that clears 90% on eight of ten functions.
— 第一段
⬇︎ 下载 PNG 𝕏 分享到 X

#Tiny LLMs#Edge Devices#Function Gemma#AI on Devices#Machine Learning

打开原文

https://t.co/32bANDfdr8

Function Gemma ships at 270 million parameters and runs nearly 2,000 tokens per second prefill on a Pixel 7. Out of the box, it hits 46% accuracy on a fixed set of app intents. Fine tune on a https://t.co/BM1BFC6L26" / X

TLMs: Tiny LLMs and Agents on Edge Devices with

youtube.com/watch?v=-TiET_ Function Gemma ships at 270 million parameters and runs nearly 2,000 tokens per second prefill on a Pixel 7. Out of the box, it hits 46% accuracy on a fixed set of app intents. Fine tune on a synthetically generated dataset and that clears 90% on eight of ten functions. Cormac walks through the two paths developers have for on device AI: a skill harness built on Gemma 4 with a restaurant roulette demo running fully on device. Then Eloquent, a production transcription app built by chaining two sub billion parameter models together. cc