Junyang Lin(@JustinLin610)2026年4月12日

we need agent evals that are really consistent with real world usages. otherwise people are optimizi...

5.5Score

用这条生成生成视频方案

we need agent evals that are really consistent with real world usages. otherwise people are optimizi...

AI 深度提炼

现有智能体评测缺乏真实世界一致性
模型优化正被误导至错误方向
目标设定偏差比刷榜问题更严峻

#AI智能体#模型评估#大模型#AI对齐

打开原文

Junyang Lin on X: "we need agent evals that are really consistent with real world usages. otherwise people are optimizing foundation models for the wrong direction. the problem of targeting is even bigger than benchmaxxing." / X

Don’t miss what’s happening

People on X are the first to know.

Post

See new posts

Conversation

![Image 1](http://x.com/JustinLin610)

Junyang Lin

@JustinLin610

we need agent evals that are really consistent with real world usages. otherwise people are optimizing foundation models for the wrong direction. the problem of targeting is even bigger than benchmaxxing.

12:36 AM · Apr 12, 2026

23.9K Views

234

Read 22 replies

New to X?

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Relevant people

![Image 2](http://x.com/JustinLin610) Junyang Lin @JustinLin610 Follow Click to Follow JustinLin610 !Image 3: ❤️!Image 4: 🍵!Image 5: ☕️!Image 6: 🍷!Image 7: 🥃!Image 8: 🍺

Trending now

What’s happening

Sports · Trending

Jose Ramirez

Sports · Trending

Jaylen Brown

Only on X · Trending

Sunday Funday

Sports · Trending

Juan Brito

Cookie Policy

Accessibility

Ads info