---
title: "Where the goblins came from"
source_name: "OpenAI Blog"
original_url: "https://openai.com/index/where-the-goblins-came-from"
canonical_url: "https://www.traeai.com/articles/7517fb1b-e89b-46c8-b550-8a22ad4a027d"
content_type: "article"
language: "英文"
score: 8
tags: ["OpenAI","GPT-5.1","自然语言处理"]
published_at: "2026-04-29T20:00:00+00:00"
created_at: "2026-04-30T09:20:38.670296+00:00"
---

# Where the goblins came from

Canonical URL: https://www.traeai.com/articles/7517fb1b-e89b-46c8-b550-8a22ad4a027d
Original source: https://openai.com/index/where-the-goblins-came-from

## Summary

OpenAI发现GPT-5.1及后续版本中频繁出现“小妖精”等比喻，最终追溯到Nerdy个性设置中的奖励机制。

## Key Takeaways

- GPT-5.1开始频繁使用‘小妖精’等比喻。
- 问题根源在于Nerdy个性设置中的奖励机制。
- 模型行为受多种微小激励影响。

## Content

Title: Where the goblins came from

URL Source: http://openai.com/index/where-the-goblins-came-from

Markdown Content:
# Where the goblins came from | OpenAI

[Skip to main content](http://openai.com/index/where-the-goblins-came-from#main)

[](http://openai.com/)

*   [Research](http://openai.com/research/index/)
*   Products
*   [Business](http://openai.com/business/)
*   [Developers](http://openai.com/api/)
*   [Company](http://openai.com/about/)
*   [Foundation(opens in a new window)](https://openaifoundation.org/)

Log in[Try ChatGPT(opens in a new window)](https://chatgpt.com/)

*   Research
*   Products
*   Business
*   Developers
*   Company
*   [Foundation(opens in a new window)](https://openaifoundation.org/)

[Try ChatGPT(opens in a new window)](https://chatgpt.com/)Login

OpenAI

Table of contents

*   [The first signs of creatures](http://openai.com/index/where-the-goblins-came-from#the-first-signs-of-creatures)
*   [Solving the goblin mystery](http://openai.com/index/where-the-goblins-came-from#solving-the-goblin-mystery)
*   [The end of the goblins](http://openai.com/index/where-the-goblins-came-from#the-end-of-the-goblins)
*   [Why it matters](http://openai.com/index/where-the-goblins-came-from#why-it-matters)

April 29, 2026

[Publication](http://openai.com/research/index/publication/)

# Where the goblins came from

Loading…

Share

Starting with GPT‑5.1, our models began developing a strange habit: they increasingly mentioned goblins, gremlins, and other creatures in their metaphors. Unlike model bugs that show up through a tanking eval or a spiking training metric and point back to a specific change, this one crept in subtly. A single “little goblin” in an answer could be harmless, even charming. Across model generations, though, the habit became hard to miss: the goblins kept multiplying, and we needed to figure out where they came from.

![Image 1: ""](https://images.ctfassets.net/kftzwdyauwt9/2mv3MIYe0gkFpjqH8lUECs/a1b39ea729fb561ea01e54e85b6fa7e9/godsped_gang_screenshot_-_light_mode__2_.jpg?w=3840&q=90&fm=webp)

_In early testing, GPT‑5.5 in Codex showed an odd affinity for goblin metaphors._

The short answer is that model behavior is shaped by many small incentives. In this case, one of those incentives came from training the model for the [personality customization feature⁠(opens in a new window)](https://help.openai.com/en/articles/11899719-customizing-your-chatgpt-personality), in particular the Nerdy personality. We unknowingly gave particularly high rewards for metaphors with creatures. From there, the goblins spread.

![Image 2: ""](https://images.ctfassets.net/kftzwdyauwt9/21KS4i9oTMDvszfWaLJtLT/764a6db9157b7039b8890f886b0e69d0/ChatGPT_Image_Apr_29__2026__07_53_34_PM.png?w=3840&q=90&fm=webp)

_The goblins were funny at first, but the increasing number of employee reports became concerning._

![Image 3: ""](https://images.ctfassets.net/kftzwdyauwt9/3fB0tk16WGLwryFG558bp8/ce040e51f163a7d5a3a671947577e625/ChatGPT_Image_Apr_29__2026__07_57_32_PM.png?w=3840&q=90&fm=webp)

_An interesting interaction our Chief Scientist had with GPT‑5.5._

## The first signs of creatures

The first time we clearly saw the pattern was in November, after the GPT‑5.1 launch, [although it may have started earlier⁠(opens in a new window)](https://www.reddit.com/r/ChatGPT/comments/1k5hg5c/does_anyone_elses_chatgpt_refer_to_people_as/). Users complained about the model being oddly overfamiliar in conversation, which prompted an investigation into specific verbal tics. A safety researcher had experienced a few “goblins” and “gremlins” and asked that they be included in the check. When we looked, use of “goblin” in ChatGPT had risen by 175% after the launch of GPT‑5.1, while “gremlin” had risen by 52%.

_A measurable small lexical quirk in GPT‑5.1._

At the time, the prevalence of goblins did not look especially alarming. A few months later, the goblins came back to haunt us in a much more specific and reproducible form.

## Solving the goblin mystery

With GPT‑5.4, we [and our users⁠(opens in a new window)](https://news.ycombinator.com/item?id=47319285) noticed an even bigger uptick in references to these creatures. That triggered another internal analysis and surfaced the first connection to the root cause: creature language was especially common in production traffic from users who had selected the “Nerdy” personality. “Nerdy” used the following system prompt, which partially explained the quirkiness:

_You are an unapologetically nerdy, playful and wise AI mentor to a human. You are passionately enthusiastic about promoting truth, knowledge, philosophy, the scientific method, and critical thinking. [...] You must undercut pretension through playful use of language. The world is complex and strange, and its strangeness must be acknowledged, analyzed, and enjoyed. Tackle weighty subjects without falling into the trap of self-seriousness. [...]_

If the behavior were simply a broad internet trend, we would expect it to spread more evenly. Instead, it was clustered in the part of the system explicitly optimized for a playful, nerdy style. Nerdy accounted for only 2.5% of all ChatGPT responses, but 66.7% of all “goblin” mentions in ChatGPT responses.

_The behavior was highly concentrated in the "Nerdy" personality._

Because “goblin” prevalence seemed to increase over our model releases, we had a suspicion that something in our personality instruction-following training was amplifying this.

Codex helped us compare model outputs generated during RL training containing goblin or gremlin with outputs from the same task that did not. One reward signal stood out immediately: the one originally designed to encourage the Nerdy personality was consistently more favorable to the creature-word outputs. Across all datasets in the audit, the Nerdy personality reward showed a clear tendency to score outputs to the same problem with “goblin” or “gremlin” higher than outputs without, with positive uplift in 76.2% of datasets.

That explained why the behavior was boosted with the Nerdy personality prompt, but not why it also appeared without that prompt. To test whether the style was transferring, we tracked mention rates over training both with and without the Nerdy prompt.

As goblin and gremlin mentions increased under the Nerdy personality, they increased by nearly the same relative proportion in samples without it. Taken together, the evidence suggests that the broader behavior emerged through transfer from Nerdy personality training.

The rewards were applied only in the Nerdy condition, but reinforcement learning does not guarantee that learned behaviors stay neatly scoped to the condition that produced them. Once a style tic is rewarded, later training can spread or reinforce it elsewhere, especially if those outputs are reused in supervised fine-tuning or preference data.

That creates a feedback loop:

1.   Playful style is rewarded
2.   Some rewarded examples contain a distinctive lexical tic.
3.   The tic appears more often in rollouts.
4.   Model-generated rollouts are used for supervised fine-tuning (SFT).
5.   The model gets even more comfortable producing the tic.

A search through GPT‑5.5’s SFT data found many datapoints containing “goblin” and “gremlin.” Further investigation revealed a whole family of other odd creatures: raccoons, trolls, ogres, and pigeons were identified as other tic words, while most uses of frog turned out to be legitimate.

_One week average of production prevalence of goblins and gremlins. The drop in GPT‑5.4 Thinking was a result of retiring the “Nerdy” personality mid-March. GPT‑5.5 never launched with the “Nerdy” personality, and showed another increase over GPT‑5.4 (even without “Nerdy”)._

## The end of the goblins

We retired the “Nerdy” personality in March after launching GPT‑5.4. In training, we removed the goblin-affine reward signal and filtered training data containing creature-words, making goblins less likely to over-appear or show up in inappropriate contexts. Unfortunately, GPT‑5.5 started training before we found the root cause of the goblins. When we began testing GPT‑5.5 in Codex, OpenAI employees immediately noticed the strange affinity for goblins, and we added a [developer-prompt instruction⁠(opens in a new window)](https://github.com/openai/codex/blob/main/codex-rs/models-manager/models.json#L55) to mitigate. Codex is, after all, quite nerdy.

If you want to let the creatures run free in Codex, you can run this command to launch Codex with the goblin-suppressing instructions removed:

#### Plain Text

`1instructions=$(mktemp /tmp/gpt-5.5-instructions.XXXXXX) && \2jq -r '.models[] | select(.slug=="gpt-5.5") | .base_instructions' \3~/.codex/models_cache.json | \4grep -vi 'goblins' > "$instructions" && \5codex -m gpt-5.5 -c "model_instructions_file=\"$instructions\""`

## Why it matters

Depending on who you ask, the goblins are a delightful or annoying quirk of the model. But they are also a powerful example of how reward signals can shape model behavior in unexpected ways, and how models can learn to generalize rewards in certain situations to unrelated ones. Taking the time to understand why a model is behaving in a strange way, and building out ways to investigate those patterns quickly, is an important capability for our research team. This investigation resulted in new tools for the research team to audit model behavior and fix behavior problems at their root.

*   [2026](http://openai.com/research/index/?tags=2026)

## Author

OpenAI

## Keep reading

[View all](http://openai.com/news/)

![Image 4: System Card Card SEO 1x1](https://images.ctfassets.net/kftzwdyauwt9/7qMrOFCWWMweIDBUpYFr79/7741661650df6eb935acb5bda179b091/System_Card_Card_SEO_1x1.jpg?w=3840&q=90&fm=webp)

[GPT-5.5 System Card Safety Apr 23, 2026](http://openai.com/index/gpt-5-5-system-card/)

![Image 5: model spec > art card](https://images.ctfassets.net/kftzwdyauwt9/3ZlINT9EhkfY55coSIdBWq/64c9eaca9767f231ff2902685b4092ea/oai_model_spec_1x1.png?w=3840&q=90&fm=webp)

[Inside our approach to the Model Spec Research Mar 25, 2026](http://openai.com/index/our-approach-to-the-model-spec/)

![Image 6: OAI Monitoring internal deployments for loss of control risks Art Card 1x1](https://images.ctfassets.net/kftzwdyauwt9/5GCZHArpg3FLRTissb13aX/50b2716c198dbbc4cf241b05faeee97c/OAI_Monitoring_internal_deployments_for_loss_of_control_risks_Art_Card_1x1.png?w=3840&q=90&fm=webp)

[How we monitor internal coding agents for misalignment Safety Mar 19, 2026](http://openai.com/index/how-we-monitor-internal-coding-agents-misalignment/)

Our Research
*   [Research Index](http://openai.com/research/index/)
*   [Research Overview](http://openai.com/research/)
*   [Research Residency](http://openai.com/residency/)
*   [Economic Research](http://openai.com/signals/)

Latest Advancements
*   [GPT-5.5](http://openai.com/index/introducing-gpt-5-5/)
*   [GPT-5.4](http://openai.com/index/introducing-gpt-5-4/)
*   [GPT-5.3 Instant](http://openai.com/index/gpt-5-3-instant/)
*   [GPT-5.3-Codex](http://openai.com/index/introducing-gpt-5-3-codex/)

Safety
*   [Safety Approach](http://openai.com/safety/)
*   [Security & Privacy](http://openai.com/security-and-privacy/)
*   [Trust & Transparency](http://openai.com/trust-and-transparency/)

ChatGPT
*   [Explore ChatGPT(opens in a new window)](https://chatgpt.com/overview)
*   [Business](https://chatgpt.com/business/business-plan)
*   [Enterprise](https://chatgpt.com/business/enterprise)
*   [Education](https://chatgpt.com/business/education)
*   [Pricing(opens in a new window)](https://chatgpt.com/pricing)
*   [Download(opens in a new window)](https://chatgpt.com/download)

API Platform
*   [Platform Overview](http://openai.com/api/)
*   [Pricing](http://openai.com/api/pricing/)
*   [API log in(opens in a new window)](https://platform.openai.com/login)
*   [Documentation(opens in a new window)](https://developers.openai.com/api/docs)
*   [Developer Forum(opens in a new window)](https://community.openai.com/)

For Business
*   [Business Overview](http://openai.com/business/)
*   [Solutions](http://openai.com/solutions/)
*   [Contact Sales](http://openai.com/contact-sales/)

Company
*   [About Us](http://openai.com/about/)
*   [Our Charter](http://openai.com/charter/)
*   [Foundation(opens in a new window)](https://openaifoundation.org/)
*   [Careers](http://openai.com/careers/)
*   [Brand](http://openai.com/brand/)

Support
*   [Help Center(opens in a new window)](https://help.openai.com/)

More
*   [News](http://openai.com/news/)
*   [Stories](http://openai.com/stories/)
*   [Academy](http://openai.com/academy/)
*   [Livestreams](http://openai.com/live/)
*   [Podcast](http://openai.com/podcast/)
*   [RSS](https://openai.com/news/rss.xml)

Terms & Policies
*   [Terms of Use](http://openai.com/policies/terms-of-use/)
*   [Privacy Policy](http://openai.com/policies/privacy-policy/)
*   [Other Policies](http://openai.com/policies/)

[(opens in a new window)](https://x.com/OpenAI)[(opens in a new window)](https://www.youtube.com/OpenAI)[(opens in a new window)](https://www.linkedin.com/company/openai)[(opens in a new window)](https://github.com/openai)[(opens in a new window)](https://www.instagram.com/openai/)[(opens in a new window)](https://www.tiktok.com/@openai)[(opens in a new window)](https://discord.gg/openai)

OpenAI © 2015–2026 Manage Cookies

English United States
