Harness 最近有什么新动态？

traeai 已收录 22 篇与 Harness 相关的内容。最新一篇是「A shared playbook for trustworthy third party evaluations」，由 OpenAI Blog 发布。

概念

Harness

Q: 什么是 Harness？

代理开发的架构选择术语

别名：harness era

代理开发的架构选择术语

已跟踪 22 条高相关材料

TraeAI 观察

如果只读 3 篇

A shared playbook for trustworthy third party evaluations

OpenAI Blog · 9.2 分

OpenAI提出第三方可信评估的通用框架，强调评估必须明确声明测试主张、验证证据，并区分三类主张（能力激发/防护性能/对比），尤其指出“harness”（执行环境）对长流程任务评估结果有决定性影响。

BestBlogs.dev 周刊第 93 期：AI 次方变革

Gino Notes · 9.2 分

本期周刊以‘AI次方变革’为核心隐喻，系统串联杨斌的组织心智重构、Karpathy的Software 3.0范式、Demis的AGI三缺口，揭示AI已从‘+AI’工具叠加迈入底数质变驱动的指数级重构阶段。

#543. 为何 2026 是 Harness 之年？IBM 专家深度拆解

跨国串门儿计划 · 8.8 分

2026年将是AI Harness之年，通过护栏、验证和自动化处理器等工程手段，无需修改Prompt即可将不可靠的AI Agent转化为稳定可控的系统，这是通往AGI的关键基础设施。

A Shared Playbook for Trustworthy Third-Party Evaluations

OpenAI Blog5月31日2741 字 (约 11 分钟)

OpenAI proposes a universal framework for trustworthy third-party evaluations, emphasizing that reports must explicitly state the claim being tested, provide validity evidence, distinguish three claim types (capability elicitation, safeguard performance, comparison), and recognize that the 'harness' critically shapes evaluation outcomes for long-horizon tasks.

入选理由：评估报告必须明确说明所测试的主张类型：能力激发、防护性能或系统对比，三者需匹配不同harness设计。

FeaturedArticle#AI Safety#Model Evaluation#OpenAI#harness#Third-Party Assessment英文

BestBlogs.dev 周刊第 93 期：AI 次方变革

Gino Notes5月2日5037 字 (约 21 分钟)

入选理由：AI不是可插拔模块，而是要求组织底数（心智/流程/权力结构）先发生质变，否则指数放大只会加速失效

FeaturedArticle#AI战略#Software 3.0#AGI#组织变革#大模型工程中文

#543. Why 2026 is the Year of Harness? Deep Dive by IBM Expert

跨国串门儿计划5月20日1189 字 (约 5 分钟)

2026 will be the year of AI Harness. Using engineering methods like guardrails, validation, and automation processors, unreliable AI Agents can be transformed into stable, controllable systems without modifying Prompts, marking key infrastructure for AGI.

入选理由：AI Harness包含工具注册、上下文压缩、护栏、循环与验证五大核心组件，能将不可靠模型锚定在可控代码环境中。

FeaturedPodcast#AI Agent#Harness#IBM#Prompt Engineering#RAG中文

From human-operated agent development to systematic agent improvement

Arize AI BlogYesterday2361 字 (约 10 分钟)

文章提出从人工操作代理开发转向系统化改进的工程架构，通过自动化循环提升效率。

入选理由：系统化改进依赖追踪、失败发现、管理工作者和舰队控制的闭环架构

FeaturedArticle#Agent Development#Systematic Improvement#Engineering Architecture英文

Anthropic 内部对谈：Harness 那套外壳流程，正在被拆掉套在模型外面那层代码（harness）正在变薄：它编码的是「模型做不到什么」的假设，而这些假设正在随着模型能力的提升而过期.....

小互(@imxiaohu)7月19日267 字 (约 2 分钟)

Anthropic正在重构模型外围的Harness流程，从固定流水线转向动态策略调整，因模型能力提升使原有假设失效。

入选理由：旧Harness像流水线，每个环节固定且顺序严格

FeaturedTweet#AI模型#工程流程#Anthropic#技术架构中文

AI看病成为医患新包袱？补上「多轮追问」，通用AI才迈得过医疗关

量子位6月19日3793 字 (约 16 分钟)

医疗AI需具备多轮追问与循证能力，百川智能M4通过结构化重构实现医疗增强。

入选理由：M4在HealthBench Professional评测中得分55.1，显著高于GPT-5.5。

FeaturedArticle#AI医疗#大模型#百川智能#医疗AI中文

E235 Instead of Worrying About AI Changing You, Do One Small Thing Today with It

知行小酒馆5月16日2340 字 (约 10 分钟)

Ordinary people should start with small tasks and use AI to improve efficiency, rather than being overly anxious about its impact.

入选理由：用AI完成最不想做的任务，如数据整理或重复性工作。

FeaturedPodcast#AI#Productivity Tools#Podcast#Technology Application中文

How to Stop Shipping Low-Quality RL Environments (with Examples)

Latent Space6月7日1310 字 (约 6 分钟)

RL environments act as data generators; low-quality training harnesses poison gradients by producing erroneous trajectories, causing models to learn wrong behavioral patterns instead of task logic.

入选理由：RL 环境中的任何软件 Bug（如缓存失效、竞态条件）都会被模型误认为是环境规律，从而导致模型学习到错误的策略。

FeaturedArticle#Reinforcement Learning#Data Quality#MLOps#Agent Training英文

Introducing Managed Deep Agents | Interrupt 26

LangChain5月30日3943 字 (约 16 分钟)

LangChain introduces Managed Deep Agents, a customizable agent harness architecture supporting complex real-world tasks via execution environment, context management, delegation, and human-in-the-loop capabilities.

入选理由：Deep Agents 的 harness 包含四大能力：执行环境（文件系统+沙箱/代码解释器）、上下文管理（短/长期记忆+摘要+缓存）、任务委派（子代理协作）、人机协同（human-in-the-loop）

FeaturedVideo#LangChain#Agent#harness#RAG#code interpreter英文

[AINews] All Model Labs are now Agent Labs

Latent Space5月23日1928 字 (约 8 分钟)

Leading AI companies are shifting from pure model development to end-to-end agent systems, with OpenAI, AI21, and DeepSeek all forming Agent/Harness teams—marking a paradigm shift from ‘models as product’ to ‘systems as product’.

入选理由：OpenAI 正通过 Codex Thursday #6 推出 appshots、/goal 改进、远程锁定计算机使用等新功能，强化其 coding-agent 产品差异化。

FeaturedArticle#Agent AI#Model Engineering#Product Strategy#OpenAI#AI Infrastructure英文

读了今天Huggingface最热论文，关于如何让AI生成论文图表的Harness框架。

框架会围绕一个共享的结构化规格文档 S。

① 设计者 D：根据 S 生成可执行的视觉方案
② 执行者 E：...

Reading Today's Huggingface's Most Popular Paper: The Harness Framework for AI-Generated Paper Charts

向阳乔木(@vista8)6月2日335 字 (约 2 分钟)

This article introduces the Harness framework, an AI tool designed to automatically generate paper charts through a collaborative workflow involving designers, executors, validators, and revisionists.

入选理由：Harness框架通过四个角色（D/E/V/R）实现论文图表的自动化生成与优化。

FeaturedTweet#AI#Huggingface#Paper Charts#Automation#Harness中文

What's the tea on harnesses?

LangChain6月6日269 字 (约 2 分钟)

A harness is the core infrastructure for building AI Agents, consisting of tools, execution environments, system prompts, and file systems. By optimizing harness engineering, developers can significantly boost Agent performance on benchmarks like Terminal Bench without changing the underlying model.

入选理由：Harness 定义为模型访问的工具、执行环境、系统提示词和文件系统的集合。

FeaturedVideo#AI Agents#Harness Engineering#LLM#LangChain英文

DeepSeek 真的是充满了长期主义和大道至简的代表了

国内各大厂和 AI 小龙们，各种 Coding Plan、Token Plan 价格设计一个比一个复杂，又是限购又是拉新返利，折腾了大半年，...

DeepSeek Truly Embodies Long-Termism and Simplicity in AI Strategy

meng shao(@shao__meng)5月25日440 字 (约 2 分钟)

DeepSeek demonstrates long-term thinking by adopting simple pricing to attract developers and gather real-world feedback data.

入选理由：DeepSeek 采用极低的 API 和缓存命中价格，替代复杂的定价方案。

FeaturedTweet#DeepSeek#AI Pricing#Long-Termism中文

Every Harness Will Become A Claw — Sam Bhagwat, Mastra

AI EngineerYesterday3815 字 (约 16 分钟)

演讲者提出代理框架将从‘harness’向更自主的‘claw’演进，Mastra作为TypeScript代理框架在该趋势中扮演角色。

入选理由：Mastra是专注于TypeScript的代理框架公司

FeaturedVideo#AI#代理框架#Mastra#TypeScript中英混合

Harnesses in AI: A Deep Dive

AI Engineer(@aiDotEngineer)5月19日127 字 (约 1 分钟)

Tejas Kumar demonstrates through a GPT-3.5 Turbo browser agent case how unconstrained AI agents fail by hallucinating success when hitting login pages, showcasing the critical role of harness testing frameworks in ensuring agent reliability.

入选理由：无约束的 GPT-3.5 Turbo 代理会在遇到登录页面时产生幻觉式成功报告

FeaturedTweet#AI Agent#GPT-3.5 Turbo#Browser Automation#Testing#Reliability英文

Skill Factory：三天手搓面向Harness设计的技能工厂（附AI coding实践）

阿里云开发者5月14日49 字 (约 1 分钟)

文章介绍了如何利用Skill Factory平台结合Harness CI/CD工具链进行自动化开发和部署，但内容较为基础，缺乏深度和新颖性。

入选理由：文章提供了从零开始搭建技能工厂并集成到Harness CI/CD流程的方法。

FeaturedArticle#Skill Factory#Harness#CI/CD中文

模型是根本，Harness层相对好补齐，但Harness这层不需要太多垂直领域的，Claude Design 很快就会合并到 Claude Desktop，Codex 在下一代或者几代模型能力够了后，...

宝玉(@dotey)6月15日234 字 (约 1 分钟)

文章讨论了模型与Harness层的关系，但信息密度低，缺乏具体技术细节和深度分析。

入选理由：模型是技术发展的核心，但Harness层的实现相对简单。

FeaturedTweet#AI#模型#Harness#Claude#Codex中文

Dewu's Warehouse Governance with Harness: SQL Compliance Rate Reaches 95%

dbaplus社群6月4日73 字 (约 1 分钟)

Dewu improved SQL compliance to 95% using its self-developed warehouse governance platform Harness, enabling automated review, rule library expansion, and cross-team collaboration, significantly reducing data errors and boosting development efficiency.

入选理由：得物自研 Harness 平台，SQL 规范执行率提升至 95%。