Andrew Ng(@AndrewYNg)2026年4月9日

New course: Efficient Inference with SGLang: Text and Image Generation, built in partnership with LM...

5.5Score

用这条生成生成视频方案

New course: Efficient Inference with SGLang: Text and Image Generation, built in partnership with LM...

AI 深度提炼

SGLang 通过缓存共享上下文减少重复计算，显著降低推理成本
课程涵盖从零实现 KV 缓存和跨请求的 RadixAttention 扩展
支持文本与扩散模型图像生成的多 GPU 加速优化

#LLM#SGLang#模型推理#深度学习#AI工程化

打开原文

Running LLMs in production is expensive, and much of that https://t.co/baiT6LKDYY" / X

Andrew Ng on X: "New course: Efficient Inference with SGLang: Text and Image Generation, built in partnership with LMSys @lmsysorg and RadixArk @radixark, and taught by Richard Chen @richardczl, a Member of Technical Staff at RadixArk. Running LLMs in production is expensive, and much of that https://t.co/baiT6LKDYY" / X

Don’t miss what’s happening

People on X are the first to know.

Post

See new posts

Conversation

![Image 1](http://x.com/AndrewYNg)

Andrew Ng

@AndrewYNg

New course: Efficient Inference with SGLang: Text and Image Generation, built in partnership with LMSys

@lmsysorg

and RadixArk

@radixark

, and taught by Richard Chen

@richardczl

, a Member of Technical Staff at RadixArk. Running LLMs in production is expensive, and much of that cost comes from redundant computation. This short course teaches you to eliminate that waste using SGLang, an open-source inference framework that caches computation already done and reuses it across future requests. When ten users share the same system prompt, SGLang processes it once, not ten times. The speedups compound quickly, especially when there's a lot of shared context across requests. Skills you'll gain: - Implement a KV cache from scratch to eliminate redundant computation within a single request - Scale caching across users and requests with RadixAttention, so shared context is only processed once - Accelerate image generation with diffusion models using SGLang's caching and multi-GPU parallelism Join and learn to make LLM inference faster and more cost-efficient at scale! https://deeplearning.ai/short-courses/efficient-inference-with-sglang-text-and-image-generation…

5:11 PM · Apr 9, 2026

86.4K Views

515

241

Read 63 replies

New to X?

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Relevant people

![Image 3](http://x.com/AndrewYNg) Andrew Ng @AndrewYNg Follow Click to Follow AndrewYNg Co-Founder of Coursera; Stanford CS adjunct faculty. Former head of Baidu AI Group/Google Brain. #ai #machinelearning, #deeplearning #MOOCs

Trending now

What’s happening

Sports · Trending

#WrestleMania!Image 4

Trending with Liv Morgan

Music · Trending

The Strokes

Politics · Trending

Kushner and Witkoff

Sports · Trending

Curtis Jones

Cookie Policy

Accessibility

Ads info