SocialReasoning-Bench: Measuring whether AI agents act in users’ best interests
SocialReasoning-Bench reveals that current frontier AI models often accept suboptimal outcomes when negotiating on behalf of users, despite completing tasks successfully.
入选理由:在日程协调中,前沿模型有36%的概率接受低于最优值15%以上的会议时间。

