We talked to the authors of ๐ฅ๐ฎ๐๐ถ๐๐ค, the people affected by Google's TurboQuant paper. ...

๐๐ถ๐๐ฒ ๐๐ต๐ถ๐ป๐ด๐ ๐๐๐๐ฐ๐ธ ๐๐ถ๐๐ต ๐๐: โข ๐ฉ๐ฒ๐ฐ๐๐ผ๐ฟ ๐พ๐๐ฎ๐ป๐๐ถ๐๐ฎ๐๐ถ๐ผ๐ป ๐ต๐ฎ๐ ๐ต๐ถ๐ ๐ถ๐๐ ๐๐ต๐ฒ๐ผ๐ฟ๐ฒ๐๐ถ๐ฐ๐ฎ๐น ๐ฐ๐ฒ๐ถ๐น๐ถ๐ป๐ด. RaBitQ is mathematically proven https://t.co/n426B4ZY6w" / X
Post
Conversation
We talked to the authors of ๐ฅ๐ฎ๐๐ถ๐๐ค, the people affected by Google's TurboQuant paper. ๐๐ถ๐๐ฒ ๐๐ต๐ถ๐ป๐ด๐ ๐๐๐๐ฐ๐ธ ๐๐ถ๐๐ต ๐๐: โข ๐ฉ๐ฒ๐ฐ๐๐ผ๐ฟ ๐พ๐๐ฎ๐ป๐๐ถ๐๐ฎ๐๐ถ๐ผ๐ป ๐ต๐ฎ๐ ๐ต๐ถ๐ ๐ถ๐๐ ๐๐ต๐ฒ๐ผ๐ฟ๐ฒ๐๐ถ๐ฐ๐ฎ๐น ๐ฐ๐ฒ๐ถ๐น๐ถ๐ป๐ด. RaBitQ is mathematically proven asymptotically optimal. The remaining gains are on the engineering side: hardware, data distribution, latency. โข ๐๐ผ๐บ๐ฝ๐ฟ๐ฒ๐๐๐ถ๐ผ๐ป ๐๐ผ๐ป'๐ ๐๐ต๐ฟ๐ถ๐ป๐ธ ๐๐๐ผ๐ฟ๐ฎ๐ด๐ฒ ๐ฑ๐ฒ๐บ๐ฎ๐ป๐ฑ. ๐๐ ๐บ๐ถ๐ด๐ต๐ ๐ด๐ฟ๐ผ๐ ๐ถ๐. Smaller vectors mean larger models run on smaller devices, which creates new workloads instead of replacing old ones. โข ๐ฆ๐ถ๐ป๐ฐ๐ฒ ๐ฅ๐ฎ๐๐ถ๐๐ค, ๐บ๐ฎ๐๐ต๐ฒ๐บ๐ฎ๐๐ถ๐ฐ๐ฎ๐น ๐๐ฒ๐ฐ๐๐ผ๐ฟ ๐พ๐๐ฎ๐ป๐๐ถ๐๐ฎ๐๐ถ๐ผ๐ป ๐ถ๐ ๐๐ต๐ฟ๐ฒ๐ฒ ๐๐๐ฒ๐ฝ๐. Random rotation (a form of Johnson-Lindenstrauss transformation) to spread information evenly across dimensions, grid construction, then quantization. โข ๐ฉ๐ฒ๐ฐ๐๐ผ๐ฟ ๐พ๐๐ฎ๐ป๐๐ถ๐๐ฎ๐๐ถ๐ผ๐ป ๐ถ๐ ๐ถ๐ป๐ณ๐น๐๐ฒ๐ป๐ฐ๐ถ๐ป๐ด ๐๐ฟ๐ฎ๐ป๐๐ณ๐ผ๐ฟ๐บ๐ฒ๐ฟ ๐ถ๐ป๐ณ๐ฒ๐ฟ๐ฒ๐ป๐ฐ๐ฒ. On the surface, KV cache compression and ANN vector compression are different problems. Mathematically, they share most of the same logic. โข ๐๐ฉ ๐ฐ๐ฎ๐ฐ๐ต๐ฒ ๐ถ๐ ๐ฐ๐ต๐ฒ๐ฎ๐ฝ ๐๐๐ผ๐ฟ๐ฎ๐ด๐ฒ ๐๐ฟ๐ฎ๐ฑ๐ฒ๐ฑ ๐ณ๐ผ๐ฟ ๐ฒ๐ ๐ฝ๐ฒ๐ป๐๐ถ๐๐ฒ ๐ฐ๐ผ๐บ๐ฝ๐๐๐ฒ. Quantization makes that trade more favorable on both sides. โข ๐ช๐ฎ๐ป๐ ๐๐ผ ๐๐ฒ๐ฒ ๐ฒ๐ป๐ด๐ถ๐ป๐ฒ๐ฒ๐ฟ๐ฒ๐ฑ ๐ฅ๐ฎ๐๐ถ๐๐ค ๐ถ๐ป ๐ฝ๐ฟ๐ผ๐ฑ๐๐ฐ๐๐ถ๐ผ๐ป? Try ๐๐ฉ๐_๐ฅ๐๐๐๐ง๐ค ๐ถ๐ป ๐ ๐ถ๐น๐๐๐ ๐ฎ.๐ฒ. ๐ก๐ผ๐๐ฒ: The views expressed are those of the interviewees and do not represent Zilliz. Views belong to ๐๐ถ๐ฎ๐ป๐๐ฎ๐ป๐ด ๐๐ฎ๐ผ (RaBitQ first author), ๐๐ต๐ฒ๐ป๐ด ๐๐ผ๐ป๐ด (RaBitQ co-author), and ๐๐ถ ๐๐ถ๐ (Zilliz Engineering). ๐๐๐น๐น ๐ฐ๐ผ๐ป๐๐ฒ๐ฟ๐๐ฎ๐๐ถ๐ผ๐ป: milvus.io/blog/interview
