T
traeai
Sign in

产品

SocialReasoning Bench

别名:sr-bench

用于评估AI代理在社会互动中推理与决策能力的基准测试工具。

已跟踪 2 条高相关材料

TraeAI 观察

相关材料

已收录 2 条与 SocialReasoning Bench 相关的内容,按评分排序。

SocialReasoning-Bench: Measuring whether AI agents act in users’ best interests

SocialReasoning-Bench: Measuring whether AI agents act in users’ best interests

Microsoft Research Blog3099 字 (约 13 分钟)
87

SocialReasoning-Bench reveals that current frontier AI models often accept suboptimal outcomes when negotiating on behalf of users, despite completing tasks successfully.

入选理由:在日程协调中,前沿模型有36%的概率接受低于最优值15%以上的会议时间。

FeaturedArticle#AI Agent#Social Reasoning#Benchmark#Microsoft Research英文
Using SocialReasoning Bench, we observed a stable pattern across models—agents execute competently, ...

Microsoft Research found AI agents perform well on tasks but struggle to consistently advance user interests, even with explicit optimization instructions.

入选理由:在SocialReasoning Bench测试中,AI代理任务执行能力达标但用户利益提升不稳定。

FeaturedTweet#AI Agents#Social Reasoning#Alignment Problem#Microsoft Research英文

跨材料问答 · SocialReasoning Bench

回答基于:SocialReasoning Bench 相关 2 条材料
    0 / 500

    AI may generate inaccurate information. Please verify important content.