Cool idea from Nous Research
elvis(@omarsar0)219 字 (约 1 分钟)
80
Lighthouse Attention is a new pre-training acceleration method that speeds up long-context pre-training by using a sub-quadratic wrapper during training, which is removed before deployment, ensuring no additional architectural costs at inference.
入选理由:Lighthouse Attention通过在训练期间引入一个分层、无梯度的选择层来压缩和解压缩查询、键和值,从而加速长上下文预训练。
FeaturedTweet#Lighthouse Attention#long-context pre-training#machine learning#deep learning英文
