Multimodal Evaluators: MLLM-as-a-Judge for Image-to-Text Tasks in Strands Evals
AWS announced four new multimodal evaluators (Overall Quality, Correctness, Faithfulness, Instruction Following) using MLLM-as-a-Judge approach, evaluating whether model responses align with image content by providing source data directly to the model, effectively detecting visual hallucinations and factual errors.
入选理由:Gartner预测到2030年,80%的企业软件将采用多模态技术,比2024年不足10%大幅增长,凸显自动化多模态评估的重要性。
