I'm very excited about this extension to the celebrated Terminal-Bench to science. If you're a scie...
Thomas Wolf(@Thom_Wolf)227 字 (约 1 分钟)
75
Thomas Wolf is excited about the extension of Terminal-Bench to scientific fields, known as Terminal-Bench Science. This benchmark evaluates AI models' ability to control tools via the command line to achieve scientific goals. It's open for contributions of real scientific workflows until August 2026, aiming to improve AI models' assistance in research work.
入选理由:Terminal-Bench Science evaluates AI models' performance in handling scientific workflows through command-line tools.
精选推文#AI#Science#Terminal-Bench#Benchmarking#Command Line英文


