TokenSpeed is a brand new inference engine purpose built for speed-of-light agentic workloads

TL;DR · AI Summary
TokenSpeed is a new open-source LLM inference engine optimized for agentic workloads, featuring advanced KV caching, an efficient scheduler, and a modular kernel architecture with multi-silicon support.
Key Takeaways
- Delivers TensorRT-LLM-level performance with vLLM-like usability.
- Pluggable layered kernel design enables scalability across multi-silicon platfor
- Built by a small team in two months under the permissive MIT open-source license
Outline
Jump quickly between sections.
Mindmap
See how the topics connect at a glance.
查看大纲文本(无障碍 / 无 JS 友好)
- TokenSpeed 推理引擎
- 核心技术
- KV 缓存优化
- 安全高效调度器
- 分层插件内核
- 性能表现
- MLA 最快注意力内核
- Blackwell 架构优化
- 生态与部署
- 多硅片支持
- MIT 开源许可
Highlights
Key sentences worth saving and sharing.
TokenSpeed is a brand new inference engine purpose built for speed-of-light agentic workloads.
It has the fastest MLA attention kernel on NVIDIA Blackwell.
Built by a lean and mission-driven team in two months > MIT license, open-source
Advanced KV cache management, safe and efficient scheduler, pluggable layered kernel system.
TensorRT LLM level performance, vLLM level usability.
Designed for multi-silicon support, enabling broad hardware compatibility.
Read their blog to learn more about its advanced KV cache management, safe and efficient scheduler, and pluggable layered kernel system designed for multi-silicon support. Plus, it" / X

TokenSpeed is a brand new inference engine purpose built for speed-of-light agentic workloads. Read their blog to learn more about its advanced KV cache management, safe and efficient scheduler, and pluggable layered kernel system designed for multi-silicon support. Plus, it also has the fastest MLA attention kernel on NVIDIA Blackwell. Congrats to
on the launch!
Quote

@lightseekorg
7h
Introducing TokenSpeed, a speed-of-light LLM inference engine. > TensorRT LLM level performance > vLLM level usability > Built by a lean and mission-driven team in two months > MIT license, open-source github.com/lightseekorg/t lightseek.org/blog/lightseek