Epoch AI 最近有什么新动态？

traeai 已收录 7 篇与 Epoch AI 相关的内容。最新一篇是「[AINews] FrontierCode: Benchmarking for Code Quality over Slop」，由 Latent Space 发布。

公司

Epoch AI

别名：epochai

发布 FrontierCode 的研究团队。

已跟踪 7 条高相关材料

TraeAI 观察

如果只读 3 篇

[AINews] FrontierCode: Benchmarking for Code Quality over Slop

Latent Space · 8.5 分

FrontierCode 是一项新的代码质量评估基准，专注于衡量代码是否可合并，而非仅通过单元测试。

Latest open artifacts (#21): Open model bonanza! Gemma 4, DeepSeek V4, Kimi K2.6, MiMo 2.5, GLM-5.1 & others. On CAISI's V4 assessment.

Interconnects AI · 8.5 分

中国开源模型与美国前沿模型能力差距持续扩大，CAISI评估显示差距达3-7个月。

How open model ecosystems compound

Interconnects AI · 8.5 分

中国开放的AI生态系统通过减少重复研发计算成本，提高了模型开发的效率和可持续性。

[AINews] FrontierCode: Benchmarking for Code Quality over Slop

Latent Space6月10日1922 字 (约 8 分钟)

FrontierCode 是一项新的代码质量评估基准，专注于衡量代码是否可合并，而非仅通过单元测试。

入选理由：FrontierCode 由开源维护者耗时 40 多小时构建，旨在评估代码是否可合并。

FeaturedArticle#FrontierCode#代码质量#AI 工程#基准测试英文

Latest open artifacts (#21): Open model bonanza! Gemma 4, DeepSeek V4, Kimi K2.6, MiMo 2.5, GLM-5.1 & others

Interconnects AI5月18日881 字 (约 4 分钟)

China's open-source models lag behind the US frontier models by 3-7 months according to CAISI evaluations.

入选理由：CAISI评估显示中国开源模型在多个基准测试中落后于美国模型，差距达3-7个月。

FeaturedArticle#AI Models#Open Source#Performance Evaluation英文

How open model ecosystems compound

Interconnects AI5月13日1141 字 (约 5 分钟)

China's open AI ecosystem reduces redundant R&D compute costs, enhancing model development efficiency and sustainability.

入选理由：中国AI生态系统的开放性减少了重复的研发计算成本，使实验室能够持续更长时间。

FeaturedArticle#AI#Machine Learning#Open Source#China中文

AI Dev 26 x SF | Ara Khan: Evals Are Broken — Use Them Anyway

DeepLearning.AI5月23日6775 字 (约 28 分钟)

AI evals are fundamentally broken—over-reliance on objective metrics misleads—but they remain critical when built, interpreted, and embedded properly in agent workflows.

入选理由：当前主流 eval（如 Epoch AI、OpenAI 的 benchmark）存在‘虚假精确性’，模型分数相近时实际能力差异显著。

FeaturedVideo#AI Evaluation#Agent Systems#Benchmarking#LLM#Engineering Practice英文

Some ideas for what comes next, May 2026

Interconnects AI5月27日1700 字 (约 7 分钟)

Author predicts 2026 will be a key year for AI development, with open models facing both challenges and opportunities.

入选理由：2026年将是AI发展的关键一年，开放模型将面临更多挑战和机遇。

FeaturedArticle#AI#OpenAI#Claude Code#Codex中文

AI Chip Component Costs: Memory at 63%

Hacker News Best5月24日1217 字 (约 5 分钟)

Memory now accounts for 63% of total AI chip component costs, making it the largest single cost driver.

入选理由：AI芯片内存成本达63%，远超其他组件。

FeaturedArticle#AI chip#memory cost#compute efficiency#hardware architecture#data center英文

FrontierMath Evaluation Reveals Fatal Errors, Updated Scores to Follow

AI HOT 精选5月12日118 字 (约 1 分钟)

FrontierMath evaluation found fatal errors in ~33% of problems; Epoch AI will release corrected dataset with updated scores.

入选理由：FrontierMath Tiers 1-4中约33%的题目被标记为致命错误

FeaturedArticle#AI Evaluation#Math Benchmark#Data Correction#Epoch AI#Model Assessment英文

跨材料问答 · Epoch AI

回答基于：Epoch AI 相关 7 条材料