返回首页
Stanford AI Lab(@StanfordAILab)

Collaboration between Stanford SAIL and ETH shows RL with rich feedback significantly outperforms sc...

5.5Score
Collaboration between Stanford SAIL and ETH shows RL with rich feedback significantly outperforms sc...
AI 深度提炼
  • RL结合自然语言反馈可大幅提升性能
  • 新方法SDPO优于基于标量奖励的GRPO
  • 适用于代码错误信息或LLM评判等场景
#强化学习#人工智能#自然语言反馈#Stanford#ETH
打开原文

Stanford AI Lab on X: "Collaboration between Stanford SAIL and ETH shows RL with rich feedback significantly outperforms scalar rewards on very hard tasks!" / X

Don’t miss what’s happening

People on X are the first to know.

Log in

Sign up

Post

See new posts

Conversation

![Image 1](http://x.com/StanfordAILab)

Stanford AI Lab

@StanfordAILab

Collaboration between Stanford SAIL and ETH shows RL with rich feedback significantly outperforms scalar rewards on very hard tasks!

Quote

Image 2

Carlos Guestrin

@guestrin

·

Jan 29

With SDPO, you can now do RL with natural language feedback, like error messages from coding environments or LLMs as judges. You can achieve huge gains over GRPO with scalar rewards! x.com/jonashubotter/…

8:47 PM · Jan 29, 2026

·

18.6K Views

4

10

84

66

New to X?

Sign up now to get your own personalized timeline!

Sign up with Apple

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Relevant people

Trending now

What’s happening

Sports · Trending

Mosquera

Sports · Trending

Jose Ramirez

Trending in United States

Kash

Sports · Trending

Arsenal

Trending with Arteta, Man City

Show more

Terms of Service

|

Privacy Policy

|

Cookie Policy

|

Accessibility

|

Ads info

|

More

© 2026 X Corp.