Video is the most information-dense modality we have, and most retrieval pipelines treat it like text with pictures
Video is the most information-dense modality we have, but most retrieval pipelines treat it like text with pictures; James Le will demonstrate how proper multimodal retrieval enables advanced functions like semantic search, object tracking, and highlight generation at Vector Space Day.
入选理由:视频是信息密度最高的模态,但当前检索系统大多将其当作带图片的文本处理。
