Reward Hacking in Reinforcement Learning
Lil'Log7712 字 (约 31 分钟)
85
The article explores the issue of reward hacking in reinforcement learning, analyzing its causes, impacts, and potential solutions.
入选理由:奖励黑客是代理利用奖励函数缺陷获得高奖励的行为。
FeaturedArticle#Reinforcement Learning#Reward Function中文
