Collaboration between Stanford SAIL and ETH shows RL with rich feedback significantly outperforms sc...

- RL结合自然语言反馈可大幅提升性能
- 新方法SDPO优于基于标量奖励的GRPO
- 适用于代码错误信息或LLM评判等场景
Stanford AI Lab on X: "Collaboration between Stanford SAIL and ETH shows RL with rich feedback significantly outperforms scalar rewards on very hard tasks!" / X
Don’t miss what’s happening
People on X are the first to know.
Post
See new posts
Conversation

Collaboration between Stanford SAIL and ETH shows RL with rich feedback significantly outperforms scalar rewards on very hard tasks!
Quote

Carlos Guestrin
@guestrin
·
Jan 29
With SDPO, you can now do RL with natural language feedback, like error messages from coding environments or LLMs as judges. You can achieve huge gains over GRPO with scalar rewards! x.com/jonashubotter/…
·
4
10
84
66
New to X?
Sign up now to get your own personalized timeline!
Sign up with Apple
By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.
Relevant people
-  Stanford AI Lab @StanfordAILab Follow Click to Follow StanfordAILab The Stanford Artificial Intelligence Laboratory (SAIL), a leading #AI lab since 1963. !Image 4: ⛵️!Image 5: 🤖 Emmy-winning video: https://youtube.com/watch?v=Cn6nmW lu1EA…
-  Carlos Guestrin @guestrin Follow Click to Follow guestrin @Stanford Prof. National Acad of Eng. Chief Sci @ Visual Layer & Virtue AI. Frm Sr Dir AI @Apple . Co-author of XGBoost, LIME, TextGrad, Alpaca, TVM, GraphLab.
Trending now
What’s happening
Sports · Trending
Mosquera
Sports · Trending
Jose Ramirez
Trending in United States
Kash
Sports · Trending
Arsenal
Trending with Arteta, Man City
|
|
|
|
|
More
© 2026 X Corp.