T
traeai
็™ปๅฝ•
่ฟ”ๅ›ž้ฆ–้กต
Milvus(@milvusio)

We talked to the authors of ๐—ฅ๐—ฎ๐—•๐—ถ๐˜๐—ค, the people affected by Google's TurboQuant paper. ...

5.0Score
We talked to the authors of ๐—ฅ๐—ฎ๐—•๐—ถ๐˜๐—ค, the people affected by Google's TurboQuant paper. 

...
AI ๆทฑๅบฆๆ็‚ผ

๐—™๐—ถ๐˜ƒ๐—ฒ ๐˜๐—ต๐—ถ๐—ป๐—ด๐˜€ ๐˜€๐˜๐˜‚๐—ฐ๐—ธ ๐˜„๐—ถ๐˜๐—ต ๐˜‚๐˜€: โ€ข ๐—ฉ๐—ฒ๐—ฐ๐˜๐—ผ๐—ฟ ๐—พ๐˜‚๐—ฎ๐—ป๐˜๐—ถ๐˜‡๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐—ต๐—ฎ๐˜€ ๐—ต๐—ถ๐˜ ๐—ถ๐˜๐˜€ ๐˜๐—ต๐—ฒ๐—ผ๐—ฟ๐—ฒ๐˜๐—ถ๐—ฐ๐—ฎ๐—น ๐—ฐ๐—ฒ๐—ถ๐—น๐—ถ๐—ป๐—ด. RaBitQ is mathematically proven https://t.co/n426B4ZY6w" / X

Post

Conversation

We talked to the authors of ๐—ฅ๐—ฎ๐—•๐—ถ๐˜๐—ค, the people affected by Google's TurboQuant paper. ๐—™๐—ถ๐˜ƒ๐—ฒ ๐˜๐—ต๐—ถ๐—ป๐—ด๐˜€ ๐˜€๐˜๐˜‚๐—ฐ๐—ธ ๐˜„๐—ถ๐˜๐—ต ๐˜‚๐˜€: โ€ข ๐—ฉ๐—ฒ๐—ฐ๐˜๐—ผ๐—ฟ ๐—พ๐˜‚๐—ฎ๐—ป๐˜๐—ถ๐˜‡๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐—ต๐—ฎ๐˜€ ๐—ต๐—ถ๐˜ ๐—ถ๐˜๐˜€ ๐˜๐—ต๐—ฒ๐—ผ๐—ฟ๐—ฒ๐˜๐—ถ๐—ฐ๐—ฎ๐—น ๐—ฐ๐—ฒ๐—ถ๐—น๐—ถ๐—ป๐—ด. RaBitQ is mathematically proven asymptotically optimal. The remaining gains are on the engineering side: hardware, data distribution, latency. โ€ข ๐—–๐—ผ๐—บ๐—ฝ๐—ฟ๐—ฒ๐˜€๐˜€๐—ถ๐—ผ๐—ป ๐˜„๐—ผ๐—ป'๐˜ ๐˜€๐—ต๐—ฟ๐—ถ๐—ป๐—ธ ๐˜€๐˜๐—ผ๐—ฟ๐—ฎ๐—ด๐—ฒ ๐—ฑ๐—ฒ๐—บ๐—ฎ๐—ป๐—ฑ. ๐—œ๐˜ ๐—บ๐—ถ๐—ด๐—ต๐˜ ๐—ด๐—ฟ๐—ผ๐˜„ ๐—ถ๐˜. Smaller vectors mean larger models run on smaller devices, which creates new workloads instead of replacing old ones. โ€ข ๐—ฆ๐—ถ๐—ป๐—ฐ๐—ฒ ๐—ฅ๐—ฎ๐—•๐—ถ๐˜๐—ค, ๐—บ๐—ฎ๐˜๐—ต๐—ฒ๐—บ๐—ฎ๐˜๐—ถ๐—ฐ๐—ฎ๐—น ๐˜ƒ๐—ฒ๐—ฐ๐˜๐—ผ๐—ฟ ๐—พ๐˜‚๐—ฎ๐—ป๐˜๐—ถ๐˜‡๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐—ถ๐˜€ ๐˜๐—ต๐—ฟ๐—ฒ๐—ฒ ๐˜€๐˜๐—ฒ๐—ฝ๐˜€. Random rotation (a form of Johnson-Lindenstrauss transformation) to spread information evenly across dimensions, grid construction, then quantization. โ€ข ๐—ฉ๐—ฒ๐—ฐ๐˜๐—ผ๐—ฟ ๐—พ๐˜‚๐—ฎ๐—ป๐˜๐—ถ๐˜‡๐—ฎ๐˜๐—ถ๐—ผ๐—ป ๐—ถ๐˜€ ๐—ถ๐—ป๐—ณ๐—น๐˜‚๐—ฒ๐—ป๐—ฐ๐—ถ๐—ป๐—ด ๐˜๐—ฟ๐—ฎ๐—ป๐˜€๐—ณ๐—ผ๐—ฟ๐—บ๐—ฒ๐—ฟ ๐—ถ๐—ป๐—ณ๐—ฒ๐—ฟ๐—ฒ๐—ป๐—ฐ๐—ฒ. On the surface, KV cache compression and ANN vector compression are different problems. Mathematically, they share most of the same logic. โ€ข ๐—ž๐—ฉ ๐—ฐ๐—ฎ๐—ฐ๐—ต๐—ฒ ๐—ถ๐˜€ ๐—ฐ๐—ต๐—ฒ๐—ฎ๐—ฝ ๐˜€๐˜๐—ผ๐—ฟ๐—ฎ๐—ด๐—ฒ ๐˜๐—ฟ๐—ฎ๐—ฑ๐—ฒ๐—ฑ ๐—ณ๐—ผ๐—ฟ ๐—ฒ๐˜…๐—ฝ๐—ฒ๐—ป๐˜€๐—ถ๐˜ƒ๐—ฒ ๐—ฐ๐—ผ๐—บ๐—ฝ๐˜‚๐˜๐—ฒ. Quantization makes that trade more favorable on both sides. โ€ข ๐—ช๐—ฎ๐—ป๐˜ ๐˜๐—ผ ๐˜€๐—ฒ๐—ฒ ๐—ฒ๐—ป๐—ด๐—ถ๐—ป๐—ฒ๐—ฒ๐—ฟ๐—ฒ๐—ฑ ๐—ฅ๐—ฎ๐—•๐—ถ๐˜๐—ค ๐—ถ๐—ป ๐—ฝ๐—ฟ๐—ผ๐—ฑ๐˜‚๐—ฐ๐˜๐—ถ๐—ผ๐—ป? Try ๐—œ๐—ฉ๐—™_๐—ฅ๐—”๐—•๐—œ๐—ง๐—ค ๐—ถ๐—ป ๐— ๐—ถ๐—น๐˜ƒ๐˜‚๐˜€ ๐Ÿฎ.๐Ÿฒ. ๐—ก๐—ผ๐˜๐—ฒ: The views expressed are those of the interviewees and do not represent Zilliz. Views belong to ๐—๐—ถ๐—ฎ๐—ป๐˜†๐—ฎ๐—ป๐—ด ๐—š๐—ฎ๐—ผ (RaBitQ first author), ๐—–๐—ต๐—ฒ๐—ป๐—ด ๐—Ÿ๐—ผ๐—ป๐—ด (RaBitQ co-author), and ๐—Ÿ๐—ถ ๐—Ÿ๐—ถ๐˜‚ (Zilliz Engineering). ๐—™๐˜‚๐—น๐—น ๐—ฐ๐—ผ๐—ป๐˜ƒ๐—ฒ๐—ฟ๐˜€๐—ฎ๐˜๐—ถ๐—ผ๐—ป: milvus.io/blog/interview

![Image 1: Image](https://x.com/milvusio/status/2047339091302433049/photo/1)