Claude Opus 4.8 is here. Is it as good as they say?
Lenny's Newsletter1002 字 (约 5 分钟)
87
Opus 4.8 scores 69.2% on Sweet Bench Pro—~5 pts above Opus 4.7, ~10 above GPT-4.5—but real-world coding reveals persistent ‘last 10%’ failures and hallucinations; pricing is steep at $5/k input tokens.
入选理由:Opus 4.8在Sweet Bench Pro上得分69.2%,显著优于Opus 4.7(+5pt)、GPT-4.5(+10pt)和Gemini 3.1(+15pt)
FeaturedArticle#Claude#LLM#Anthropic#AI coding#benchmark英文
