# 🚀 Introducing FlashQLA: high-performance linear attention kernels built on TileLang.

⚡ 2–3× forwar...

Canonical URL: https://www.traeai.com/articles/7b56c964-438a-499c-8dd8-91595b373760
Original source: https://x.com/Alibaba_Qwen/status/2049462758211772663
Source name: Qwen(@Alibaba_Qwen)
Content type: tweet
Language: 英文
Score: 8.5
Reading time: 1 分钟
Published: 2026-04-29T12:16:13+00:00
Tags: FlashQLA, TileLang, AI加速, 线性注意力

## Summary

FlashQLA 是基于 TileLang 的高性能线性注意力内核，提供2-3倍前向加速和2倍后向加速，专为个人设备上的代理AI设计。

## Key Takeaways

- FlashQLA 提供2-3倍前向加速和2倍后向加速。
- 通过门控驱动的自动片内CP提高SM利用率。
- 16阶段Warp特化流水线实现高效的后向传递。

## Outline

- 引言 — 介绍FlashQLA及其主要性能提升。
  - 核心机制 — 详细说明FlashQLA的关键技术特点。
  - 性能优化 — 描述FlashQLA在不同场景下的性能优势。
  - 代码与文档 — 提供FlashQLA的代码库和博客链接。

## Highlights

- > 🚀 Introducing FlashQLA: high-performance linear attention kernels built on TileLang. — 第 1 段
- > ⚡ 2–3× forward speedup. 2× backward speedup. — 第 1 段
- > 💡Key insights: 1. Gate-driven automatic intra-card CP. 2. Hardware-friendly algebraic reformulation. 3. TileLang fused warp-specialized kernels. — 第 1 段

## Citation Guidance

When citing this item, prefer the canonical traeai article URL for the AI-readable summary and include the original source URL when discussing the underlying source material.