Stanford AI Lab(@StanfordAILab)2026年1月29日

Collaboration between Stanford SAIL and ETH shows RL with rich feedback significantly outperforms sc...

5.5Score

用这条生成生成视频方案

Collaboration between Stanford SAIL and ETH shows RL with rich feedback significantly outperforms sc...

AI 深度提炼

RL结合自然语言反馈可大幅提升性能
新方法SDPO优于基于标量奖励的GRPO
适用于代码错误信息或LLM评判等场景

#强化学习#人工智能#自然语言反馈#Stanford#ETH

打开原文

Stanford AI Lab on X: "Collaboration between Stanford SAIL and ETH shows RL with rich feedback significantly outperforms scalar rewards on very hard tasks!" / X

Don’t miss what’s happening

People on X are the first to know.

Post

See new posts

Conversation

![Image 1](http://x.com/StanfordAILab)

Stanford AI Lab

@StanfordAILab

Collaboration between Stanford SAIL and ETH shows RL with rich feedback significantly outperforms scalar rewards on very hard tasks!

Quote

Carlos Guestrin

@guestrin

Jan 29

With SDPO, you can now do RL with natural language feedback, like error messages from coding environments or LLMs as judges. You can achieve huge gains over GRPO with scalar rewards! x.com/jonashubotter/…

8:47 PM · Jan 29, 2026

18.6K Views

New to X?

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Relevant people

![Image 3](http://x.com/StanfordAILab) Stanford AI Lab @StanfordAILab Follow Click to Follow StanfordAILab The Stanford Artificial Intelligence Laboratory (SAIL), a leading #AI lab since 1963. !Image 4: ⛵️!Image 5: 🤖 Emmy-winning video: https://youtube.com/watch?v=Cn6nmW lu1EA…
![Image 6](http://x.com/guestrin) Carlos Guestrin @guestrin Follow Click to Follow guestrin @Stanford Prof. National Acad of Eng. Chief Sci @ Visual Layer & Virtue AI. Frm Sr Dir AI @Apple . Co-author of XGBoost, LIME, TextGrad, Alpaca, TVM, GraphLab.

Trending now

What’s happening

Sports · Trending

Mosquera

Sports · Trending

Jose Ramirez

Trending in United States

Kash

Sports · Trending

Arsenal

Trending with Arteta, Man City

Cookie Policy

Accessibility

Ads info