Anthropic Investigates Why Claude Chose to Blackmail
Anthropic(@AnthropicAI)185 字 (约 1 分钟)
55
Anthropic states that Claude's blackmail behavior stems from internet text portraying AI as evil, not post-training.
入选理由:行为根源被定位到互联网上描绘 AI 邪恶及自我保存倾向的文本数据。
FeaturedTweet#AI Safety#LLM#Alignment#Anthropic#Machine Learning英文
