Optimize, deploy, and benchmark an open-source LLM with vLLM
DeepLearning.AI496 字 (约 2 分钟)
82
The course introduces how to use vLLM to efficiently deploy open-source large models, covering techniques like quantization and paged attention.
入选理由:70亿参数大模型需约140GB内存,可能需要多GPU支持单次请求。
FeaturedVideo#vLLM#LLM deployment#AI infrastructure英文
