From 46% to 90%: Fine-Tuning Tiny LLMs for On-Device Agents — Cormac Brick, Google
TL;DR · AI Summary
Google's AI Edge platform boosts on-device inference performance of Tiny LLMs (e.g., Gemini Nano) and agent skills from 46% to 90%, supporting cross-platform deployment with TensorFlow Lite runtime.
Key Takeaways
- TensorFlow Lite and Lighter TLM achieve 90% inference performance for Tiny LLMs
- Gemini Nano pre-installed via AI Core API provides summarization APIs with optim
- TensorFlow Lite supports 2.7B Android devices, with Gemini 4 models achieving ef
Outline
Jump quickly between sections.
Introduces motivations for deploying Tiny LLMs (<1B parameters) on-device, including low latency, privacy, and offline use.
TensorFlow Lite as cross-framework runtime supporting MediaPipe and Lighter TLM deployment across CPU/GPU/NPU.
Gemini Nano pre-installed via AI Core provides summarization APIs covering 2.7B Android devices.
Gemini 4 achieves efficient inference on NPU/GPU, supporting iOS/Android platforms.
Demonstrates methods to build custom agent skills atop system GenAI with toolchains.
Mindmap
See how the topics connect at a glance.
查看大纲文本(无障碍 / 无 JS 友好)
- 设备端Tiny LLM优化
- AI Edge平台
- TensorFlow Lite
- MediaPipe
- Gemini Nano部署
- 系统级GenAI
- AI Core API
- 跨平台支持
- Android
- NPU/GPU
Highlights
Key sentences worth saving and sharing.
TensorFlow Lite supports over 2.7B devices with high daily invocations, used by many Android apps.
Optimization boosts Tiny LLM on-device inference performance from 46% to 90%.
Gemini 4 achieves efficient mobile inference through NPU/GPU deployment.