LiteRT-LM 最近有什么新动态？

traeai 已收录 4 篇与 LiteRT-LM 相关的内容。最新一篇是「Benchmark and optimize LLMs on-device with AI Edge Portal」，由 Google Cloud Blog 发布。

产品

LiteRT-LM

轻量级大语言模型运行时，支持通过CLI在本地启动兼容OpenAI格式的服务端点。

已跟踪 4 条高相关材料

TraeAI 观察

如果只读 3 篇

Benchmark and optimize LLMs on-device with AI Edge Portal

Google Cloud Blog · 8.5 分

Google AI Edge Portal新增LLM基准测试和调试功能，支持在120+ Android设备上优化模型性能，提供初始化时间、解码速度等关键指标分析及可视化调试工具。

Bringing Gemma 4 12B to your Laptop: Unlocking Local, Agentic Workflows with Google AI Edge

Google Developers Blog · 8.2 分

Gemma 4 12B模型结合Google AI Edge栈已实现笔记本端本地运行，支持macOS上的代码生成、语音编辑及OpenAI兼容API服务。该组合使设备端Agent工作流成为可能，指令遵循质量提升超60%，且全程离线保障数据隐私。

Blazing fast on-device GenAI with LiteRT-LM

Google Developers Blog · 7.5 分

Google AI Edge 发布 LiteRT-LM 推理引擎，专为在边缘设备上高效运行 Gemma 4 模型设计，支持 Android、iOS、Web 多平台，GPU 推理可达 76 tokens/sec，结合 Multi-Token Prediction 技术实现 2.2...

Benchmark and optimize LLMs on-device with AI Edge Portal

Google Cloud Blog5月21日924 字 (约 4 分钟)

Google AI Edge Portal introduces new LLM benchmarking and debugging capabilities, enabling performance optimization across over 120 Android devices with key metrics like initialization time and decode speed analysis, plus visualization tools.

入选理由：AI Edge Portal支持在120+ Android设备上测试LLM，提供初始化时间、预填速度等4项核心性能指标

FeaturedArticle#LLM optimization#Edge computing#Android devices#Google AI Edge Portal#Model Explorer英文

Bringing Gemma 4 12B to your Laptop: Unlocking Local, Agentic Workflows with Google AI Edge

Google Developers Blog6月5日988 字 (约 4 分钟)

Gemma 4 12B combined with Google AI Edge enables local execution on laptops, supporting code generation, voice editing, and OpenAI-compatible APIs on macOS. This setup facilitates on-device agentic workflows with a 60%+ quality boost in instruction following while ensuring offline privacy.

入选理由：Gemma 4 12B通过LiteRT-LM在消费级笔记本运行，支持本地Agent与多模态任务。

FeaturedArticle#Gemma 4#Google AI Edge#On-device AI#LiteRT-LM#Agentic Workflow英文

Blazing fast on-device GenAI with LiteRT-LM

Google Developers Blog5月20日1574 字 (约 7 分钟)

Google AI Edge introduces LiteRT-LM, an optimized inference engine for deploying Gemma 4 models on edge devices, supporting Android, iOS, and web platforms with GPU inference reaching 76 tokens/sec and Multi-Token Prediction delivering up to 2.2x speedup.

入选理由：LiteRT-LM 在 Android GPU (OpenCL) 上实现 52 tokens/sec 解码速度，iOS (Metal) 达 56 tokens/sec，WebGPU 在 MacBook Pro 上可达 76 tokens/sec

FeaturedArticle#Google AI Edge#LiteRT-LM#Gemma 4#Edge AI#On-device Inference英文

TLMs: Tiny LLMs and Agents on Edge Devices with LiteRT-LM — Cormac Brick, Google

AI Engineer5月4日1303 字 (约 6 分钟)

Google 提出 TLMs（Tiny Language Models）与 LiteRT-LM 框架，支持在边缘设备上高效部署轻量级 LLM 和自主 Agent，兼顾低延迟、隐私保护与离线能力。

入选理由：TLMs 是专为边缘设备优化的 sub-100M 参数 LLM，通过结构压缩与量化实现毫秒级推理。

FeaturedVideo#LLM#edge computing#Google#LiteRT-LM#TLM英文

跨材料问答 · LiteRT-LM

回答基于：LiteRT-LM 相关 4 条材料