Perplexity(@perplexity_ai)2026年4月22日

Our reward design combines correctness, preference, and efficiency. Preference only counts when the...

6.0Score

用这条生成生成视频方案

Our reward design combines correctness, preference, and efficiency.

Preference only counts when the...

AI 深度提炼

奖励设计结合正确性、偏好和效率。
偏好仅在答案正确时计入评分。
避免模型优化为“听起来更好但错误”的答案。

#AI#机器学习#奖励设计

打开原文

Preference only counts when the answer is correct.

This keeps the model from optimizing for better-sounding wrong answers. https://t.co/VbJ1M4o26w" / X

Post

Conversation

![Image 1: Square profile picture](https://x.com/perplexity_ai)

Perplexity

@perplexity_ai

Our reward design combines correctness, preference, and efficiency. Preference only counts when the answer is correct. This keeps the model from optimizing for better-sounding wrong answers.

![Image 2: Image](https://x.com/perplexity_ai/status/2047016429883740580/photo/1)

6:15 PM · Apr 22, 2026

8,810 Views

New to X?

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Trending now

What’s happening

Sports · Trending

#BURMCI

Trending in United States

#MichaelMovie!Image 3

Trending in United States

Grapefruit

Politics · Trending

Hung Cao

Trending with Phelan, Secretary of the Navy

Cookie Policy

Accessibility

Ads info