Don't rush to RL after SFT! Your multimodal large model may have been training with injuries
量子位2434 字 (约 10 分钟)
85
SFT may introduce distribution bias during the training of multimodal large models, leading to performance degradation in the RL phase. PRISM addresses this issue through a three-stage pipeline.
入选理由:SFT可能导致模型性能下降,如Qwen3-VL-8B SFT后准确率下降5.2%
FeaturedArticle#Multimodal#Large Model#PRISM中文
