How to Run llama.cpp with MTP (Multi-token Prediction)
MTP is a new speculative decoding feature built into llama.cpp that can approximately double token generation speed for most use cases, achieving ~30 tok/sec with the Dense 27B model and ~100 tok/sec with the MoE model.
入选理由:MTP是内置于模型本身的投机解码新特性,可将token生成速度提升约2倍









