SWE benchmarks don’t necessarily capture app building capabilities. ViBench does.
Existing SWE benchmarks do not necessarily capture the full range of app building capabilities, and ViBench fills this gap by focusing on evaluating models in end-to-end web application development.
入选理由:当前SWE基准测试无法充分衡量AI模型的应用构建能力。

