T
traeai
RSS登录
返回首页
Perplexity(@perplexity_ai)

Our reward design combines correctness, preference, and efficiency. Preference only counts when the...

6.0Score
Our reward design combines correctness, preference, and efficiency.

Preference only counts when the...
AI 深度提炼
  • 奖励设计结合正确性、偏好和效率。
  • 偏好仅在答案正确时计入评分。
  • 避免模型优化为“听起来更好但错误”的答案。
#AI#机器学习#奖励设计
打开原文

Preference only counts when the answer is correct.

This keeps the model from optimizing for better-sounding wrong answers. https://t.co/VbJ1M4o26w" / X

Post

Conversation

![Image 1: Square profile picture](https://x.com/perplexity_ai)

Perplexity

@perplexity_ai

Our reward design combines correctness, preference, and efficiency. Preference only counts when the answer is correct. This keeps the model from optimizing for better-sounding wrong answers.

![Image 2: Image](https://x.com/perplexity_ai/status/2047016429883740580/photo/1)

6:15 PM · Apr 22, 2026

8,810 Views

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Trending now

What’s happening

Sports · Trending

#BURMCI

Trending in United States

#MichaelMovie!Image 3

Trending in United States

Grapefruit

Politics · Trending

Hung Cao

Trending with Phelan, Secretary of the Navy

Show more

Terms of Service

|

Privacy Policy

|

Cookie Policy

|

Accessibility

|

Ads info

|

© 2026 X Corp.