# Running LLMs on your iPhone: 40 tok/s Gemma 4 with MLX — Adrien Grondin, Locally AI

Canonical URL: https://www.traeai.com/articles/9d5dde09-e732-4271-9335-7dfd43d53568
Original source: https://www.youtube.com/watch?v=a2muGkT4WD4
Source name: AI Engineer
Content type: video
Language: 英文
Score: 9.0
Reading time: 5 分钟
Published: 2026-04-20T21:53:25+00:00
Tags: LLM, 移动端, MLX

## Summary

Adrien Grondin 展示了如何利用 MLX 在 iPhone 上高效运行 LLM 模型。

## Key Takeaways

- 实现了 40 tokens/s 的高性能推理速度。
- 详细介绍了 MLX 框架的技术优势与实现细节。
- 为移动端 AI 应用开发提供了新思路。

## Citation Guidance

When citing this item, prefer the canonical traeai article URL for the AI-readable summary and include the original source URL when discussing the underlying source material.