T
traeai
Sign in

概念

RL

别名:Reinforcement Learning

强化学习(Reinforcement Learning),一种机器学习范式。

相关材料

已收录 5 条与 RL 相关的内容,按评分排序。

How Cursor Ships a 1TB Model Across the World Mid-Training

How Cursor Ships a 1TB Model Across the World Mid-Training

Sequoia Capital355 字 (约 2 分钟)
85

Cursor achieves 1TB model cross-continental synchronization during training by leveraging weight change patterns in RL, reducing transmission volume by 20x and ensuring model consistency.

入选理由:RL训练中仅少量权重变化,delta压缩使传输量减少20倍。

FeaturedVideo#Model Transfer#Delta Compression#Reinforcement Learning#Distributed Training英文
#539. 手搓AlphaGo:前DeepMind科学家拆解AI围棋核心原理,以及对LLM强化学习的深远启示

Rebuilding AlphaGo: A Deep Dive into AI Go Core Principles and Implications for LLMs

跨国串门儿计划1868 字 (约 8 分钟)
85

AlphaGo uses MCTS and neural networks to achieve efficient search, showcasing the potential of reinforcement learning.

入选理由:AlphaGo 使用 MCTS 和神经网络实现高效搜索,每步都有明确监督目标。

FeaturedPodcast#AI#Reinforcement Learning#Go#Neural Networks#Search Algorithms中文
Vol.119|对话 Macaron AI 创始人 Andrew:下一代模型公司正在从 Agent 产品里长出来?

Andrew, founder of Mind Lab (Macaron AI), argues that next-generation model companies are emerging from Agent products, using LoRA reinforcement learning and continuous learning to evolve AI Agents in real-world scenarios for personalized, interactive long-term intelligence.

入选理由:Mind Lab实现了万亿参数规模的LoRA强化学习,并构建了支持DSA和MTP的LoRA RL基础设施。

FeaturedPodcast#Agent#LoRA#Reinforcement Learning#Continuous Learning#Personal AGI中文
Cursor  | The Hidden Bug in Every Large-Scale RL Run

Cursor | The Hidden Bug in Every Large-Scale RL Run

Sequoia Capital248 字 (约 1 分钟)
75

In large-scale RL training, numerical mismatches arise due to model version drift and floating-point precision differences, causing inconsistent log probabilities during inference and introducing training bias.

入选理由:在异步训练中,需重运行前向传播以生成对数概率,但相同模型版本下结果可能不同。

FeaturedVideo#Reinforcement Learning#Large Models#Numerical Stability#Training Systems#AI Systems Engineering英文
We've gotten really really good at RL. Composer 2.5 is fighting well-above its weight class.

Very e...

We've gotten really really good at RL. Composer 2.5 is fighting well-above its weight class.

Sualeh Asif(@sualehasif996)134 字 (约 1 分钟)
50

Cursor Composer 2.5 is officially released, achieving performance breakthroughs through reinforcement learning with double free usage for one week. The new model excels at handling long-term complex tasks, and the Cursor team is collaborating with SpaceXAI to scale model sizes and compute.

入选理由:Composer 2.5采用强化学习优化,性能表现超出预期

FeaturedTweet#Cursor#Composer 2.5#Reinforcement Learning#AI Programming Tool#SpaceXAI英文

跨材料问答 · RL

回答基于:RL 相关 5 条材料
    0 / 500

    AI may generate inaccurate information. Please verify important content.