Qwen(@Alibaba_Qwen)2026年4月16日

VLM Performance：Qwen3.6 is natively multimodal, and Qwen3.6-35B-A3B showcases perception and multimo...

5.5Score

用这条生成生成视频方案

VLM Performance：Qwen3.6 is natively multimodal, and Qwen3.6-35B-A3B showcases perception and multimo...

AI 深度提炼

Qwen3.6-35B-A3B 为原生多模态架构，非后期对齐
激活参数仅约30亿，但性能接近Claude Sonnet 4.5
在RefCOCO（92.0）和ODInW13（50.8）等空间理解任务表现突出

#Qwen#多模态#大模型#视觉语言模型#阿里巴巴

打开原文

Qwen on X: "VLM Performance：Qwen3.6 is natively multimodal, and Qwen3.6-35B-A3B showcases perception and multimodal reasoning capabilities that far exceed what its size would suggest, with only around 3 billion activated parameters. Across most vision-language benchmarks, its performance https://t.co/nOVBNlVfzW" / X

Don’t miss what’s happening

People on X are the first to know.

Post

See new posts

Conversation

![Image 1: Square profile picture](http://x.com/Alibaba_Qwen)

Qwen

@Alibaba_Qwen

VLM Performance：Qwen3.6 is natively multimodal, and Qwen3.6-35B-A3B showcases perception and multimodal reasoning capabilities that far exceed what its size would suggest, with only around 3 billion activated parameters. Across most vision-language benchmarks, its performance matches Claude Sonnet 4.5, and even surpasses it on several tasks. Its strengths are particularly evident in spatial intelligence, where it achieves 92.0 on RefCOCO and 50.8 on ODInW13.

![Image 2: Image](http://x.com/Alibaba_Qwen/status/2044768742761189762/photo/1)

1:23 PM · Apr 16, 2026

55.6K Views

393

Read 6 replies