Ghost AI：让 AI Agent 构建可丢弃的世界

Wes Roth

Wes RothVideo2026年5月30日

Ghost AI: Let’s AI Agents Build Disposable Worlds

8.7Score

Watchable video resourceOpen original video

TL;DR · AI Summary

Ghost AI proposes disposable database copies for AI Agents to safely experiment with data-layer changes; the author validates LLMs’ physics-control learning via the Gravell GPT benchmark with 30 iterative feedback rounds.

Key Takeaways

Direct database access for AI Agents is highly risky; each agent must be given a
In Gravell GPT, LLMs write spaceship control scripts and compete over 200 ticks
The benchmark uses 30 iterative rounds: generate code → run simulation → receive

Outline

Jump quickly between sections.

§Risks of AI Agent Experimentation
Granting AI Agents direct database write access is extremely dangerous because the database holds the application’s core state, and errors can cause irreversible damage.
·Version Control Gap: Code vs Database
Code can be safely rolled back via Git, but databases lack equivalent disposable sandboxing, hindering AI-driven experimental development.
·Ghost AI Solution: Disposable DB Copies
Each AI Agent gets its own temporary, isolated database copy, enabling safe experimentation—failed attempts are discarded without side effects.
§Gravell GPT: A Physics-Based LLM Benchmark
This benchmark tasks LLMs with writing control scripts for spaceships in a Newtonian physics environment with four stars and three ships.
·30-Round Iterative Feedback Design
Each round generates code → runs 200-tick simulation → returns structured failure feedback (e.g., collision location, fuel depletion) → improves next iteration.

Mindmap

See how the topics connect at a glance.

查看大纲文本（无障碍 / 无 JS 友好）

Ghost AI：AI Agent 安全实验框架
- 核心问题
  - 数据库无安全沙盒机制
  - Agent 直接写 DB 风险极高
  - 代码可回滚 vs 数据不可逆
- 解决方案
  - Disposable DB Copies
  - 按 Agent 隔离副本
  - 失败即丢弃，零副作用
- 验证案例：Gravell GPT
  - 4 恒星 + 3 飞船物理模拟
  - LLM 生成控制脚本而非实时操控
  - 30 轮反馈迭代 → 200 tick 评分

Highlights

Key sentences worth saving and sharing.

The database is ‘the state of the world’—users, orders, product catalogs, game economies, analytics, settings, and history all reside here; accidental modification is far more damaging than code error
— 1:21–1:35
⬇︎ 下载 PNG 𝕏 分享到 X
In Gravell GPT, the LLM does not control ships directly but writes control scripts; the goal is to keep ships inside a moving disc for as long as possible over 200 ticks without crashing or running ou
— 2:44–3:00
⬇︎ 下载 PNG 𝕏 分享到 X
The key innovation is 30 iterative rounds: after each code generation, a full simulation runs and returns structured feedback, enabling LLMs to improve like human developers through trial and error.
— 4:11–4:23
⬇︎ 下载 PNG 𝕏 分享到 X

#AI Agents#Database Safety#LLM Benchmark#Simulation