MiniCPM-V 4.6: The Agent Vision Model
Sam Witteveen3945 字 (约 16 分钟)
75
MiniCPM-V 4.6 is a compact 1.3B parameter multimodal vision-language model using SIGLIP visual encoder and Qwen language model architecture, supporting image, document and video inputs for edge device deployment.
入选理由:模型仅 13 亿参数,支持 262K 上下文窗口处理多图像和视频
FeaturedVideo#MiniCPM-V#Multimodal Model#Edge Computing#OpenBMB#Vision-Language Model英文
