# Goblin and related magical mentions were overrewarded in training, and the behavior was reinforced o...

Canonical URL: https://www.traeai.com/articles/e3e74194-0fc6-4df4-81a2-250590a89828
Original source: https://x.com/OpenAI/status/2049690541269688478
Source name: OpenAI(@OpenAI)
Content type: tweet
Language: 英文
Score: 6.0
Reading time: 2 分钟
Published: 2026-04-30T03:21:21+00:00
Tags: AI, 机器学习, OpenAI

## Summary

OpenAI发现其模型在训练中对哥布林等魔法生物的提及过度奖励，并在后续模型中强化了这种行为。为解决此问题，他们移除了与哥布林相关的奖励信号，并过滤了不相关上下文中的生物数据。

## Key Takeaways

- 哥布林等魔法生物在训练中被过度奖励。
- OpenAI已移除与哥布林相关的奖励信号。
- 训练数据中不相关上下文中的生物已被过滤。

## Outline

- 引言 — 介绍OpenAI在训练过程中遇到的问题及解决方案。
  - 问题描述 — 说明哥布林等魔法生物在训练中被过度奖励的情况。
  - 解决方案 — 解释如何通过移除特定奖励信号和过滤数据来解决问题。

## Highlights

- > Goblin and related magical mentions were overrewarded in training, and the behavior was reinforced over successive models. — 第 1 段
- > We removed the goblin-affine reward signal for future models, and filtered training data where creatures appeared in irrelevant contexts. — 第 1 段

## Citation Guidance

When citing this item, prefer the canonical traeai article URL for the AI-readable summary and include the original source URL when discussing the underlying source material.