T
traeai
Sign in

概念

DPO

别名:Direct Preference Optimization

直接偏好优化,用于模型训练的方法。

已跟踪 4 条高相关材料

TraeAI 观察

相关材料

已收录 4 条与 DPO 相关的内容,按评分排序。

Direct Preference Optimization Beyond Chatbots

Direct Preference Optimization Beyond Chatbots

Hugging Face Blog2903 字 (约 12 分钟)
85

This article introduces Direct Preference Optimization (DPO) technology, which optimizes text generation by using rejection pairs from the model's own failures, significantly reducing text degradation rates. DPO is particularly effective in OCR tasks, as it can serve as a direct mitigation tool for specific failure modes without relying on subjective human judgments.

入选理由:DPO技术通过使用模型自身失败时产生的拒绝对来优化文本生成,显著减少了文本退化率。

FeaturedArticle#Direct Preference Optimization#OCR#text generation#model training中文
GLM 5.1 from @Zai_org is now available on @FireworksAI_HQ Training Platform across the Managed and T...

Fireworks AI 平台正式支持智谱 GLM 5.1 模型,提供 SFT/DPO 微调能力、200K 超长上下文窗口,专为长周期智能体编程微调优化,RL 训练即将上线。

入选理由:GLM 5.1 已集成至 Fireworks AI 托管与 API 训练工作流

FeaturedTweet#GLM#Fireworks AI#大模型微调#SFT#DPO中文

跨材料问答 · DPO

回答基于:DPO 相关 4 条材料
    0 / 500

    AI may generate inaccurate information. Please verify important content.