7B Beats o3 and GPT-5! Medical AI Agents Learn ‘Where to Look and How to Look’
量子位2595 字 (约 11 分钟)
92
Ophiuchus-7B achieves a mean score of 68.0 on 8 medical VQA benchmarks, surpassing OpenAI-o3 (62.2), Gemini 2.5 Pro (61.8), and GPT-5 (59.9). The core breakthrough is the new ‘Think with Images/Videos’ paradigm: models actively invoke tools like SAM2 and BiomedParse during reasoning to re-examine key regions/moments, making visual evidence an integral part of cognition—not just input.
入选理由:Ophiuchus-7B在8个医学VQA benchmark平均得分68.0,显著高于o3(62.2)、Gemini 2.5 Pro(61.8)与GPT-5(59.9)
FeaturedArticle#Medical AI#Multimodal LLM#Agent#ICML 2026#Visual Reasoning中文
