Harness-1：基于强化学习训练的有状态搜索20B检索子智能体

AI HOT 精选

AI HOT 精选2026年6月7日

Harness-1：基于强化学习训练的有状态搜索20B检索子智能体

3.0Score

TL;DR · AI 摘要

文章主要列举了网站使用的各种 Cookie，缺乏技术深度与实用价值。

核心要点

列出了 30+ 种 Cookie 名称与用途。
Cookie 分为必要、功能、分析等类别。
多数 Cookie 与安全、反作弊相关。

思维导图

用一张图看清主题之间的关系。

查看大纲文本（无障碍 / 无 JS 友好）

Cookie 列表

#Cookie#Web 安全#隐私

打开原文

Meet Harness-1: A 20B Retrieval Subagent Trained With Reinforcement Learning Inside a Stateful Search Harness on gpt-oss-20b - MarkTechPost

We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies. .

Necessary Always Active

Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.

Cookie __cf_bm
Duration 1 hour
Description This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.

Cookie _pxvid
Duration 1 year
Description PerimeterX sets this cookie to detect fraud and bot activity.

Cookie _px3
Duration 6 minutes
Description This cookie is set by the Bloomberg to protect the site from BOT attacks.

Cookie CookieLawInfoConsent
Duration 1 year
Description CookieYes sets this cookie to record the default button state of the corresponding category and the status of CCPA. It works only in coordination with the primary cookie.

Cookie cookielawinfo-checkbox-necessary
Duration 11 months
Description This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".

Cookie cookielawinfo-checkbox-others
Duration 1 year
Description Set by the GDPR Cookie Consent plugin, this cookie stores user consent for cookies in the category "Others".

Cookie cookielawinfo-checkbox-non-necessary
Duration 11 months
Description This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary".

Cookie cookielawinfo-checkbox-analytics
Duration 1 year
Description Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Analytics" category.

Cookie cookielawinfo-checkbox-performance
Duration 1 year
Description Set by the GDPR Cookie Consent plugin, this cookie stores the user consent for cookies in the category "Performance".

Cookie cookielawinfo-checkbox-uncategorized
Duration 1 year
Description The cookie is set by the GDPR Cookie Consent plugin to record the user consent for cookies in the category "Uncategorized".

Cookie cookielawinfo-checkbox-functional
Duration 1 year
Description The GDPR Cookie Consent plugin sets the cookie to record the user consent for the cookies in the category "Functional".

Cookie cookielawinfo-checkbox-advertisement
Duration 1 year
Description Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Advertisement" category.

Cookie wpEmojiSettingsSupports
Duration session
Description WordPress sets this cookie when a user interacts with emojis on a WordPress site. It helps determine if the user's browser can display emojis properly.

Cookie VISITOR_PRIVACY_METADATA
Duration 6 months
Description YouTube sets this cookie to store the user's cookie consent state for the current domain.

Cookie viewed_cookie_policy
Duration 11 months
Description The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie PHPSESSID
Duration
Description This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.

Cookie __cfduid
Duration 4 weeks
Description The cookie is set by CloudFare. The cookie is used to identify individual clients behind a shared IP address d apply security settings on a per-client basis. It doesnot correspond to any user ID in the web application and does not store any personally identifiable information.

Functional

[x]

Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.

Cookie yt-remote-connected-devices
Duration never
Description YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.

Cookie ytidb::LAST_RESULT_ENTRY_KEY
Duration never
Description The cookie ytidb::LAST_RESULT_ENTRY_KEY is used by YouTube to store the last search result entry that was clicked by the user. This information is used to improve the user experience by providing more relevant search results in the future.

Cookie yt-remote-device-id
Duration never
Description YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.

Cookie yt-remote-session-name
Duration session
Description The yt-remote-session-name cookie is used by YouTube to store the user's video player preferences using embedded YouTube video.

Cookie yt-remote-fast-check-period
Duration session
Description The yt-remote-fast-check-period cookie is used by YouTube to store the user's video player preferences for embedded YouTube videos.

Cookie yt-remote-session-app
Duration session
Description The yt-remote-session-app cookie is used by YouTube to store user preferences and information about the interface of the embedded YouTube video player.

Cookie yt-remote-cast-available
Duration session
Description The yt-remote-cast-available cookie is used to store the user's preferences regarding whether casting is available on their YouTube video player.

Cookie yt-remote-cast-installed
Duration session
Description The yt-remote-cast-installed cookie is used to store the user's video player preferences using embedded YouTube video.

Cookie na_id
Duration 1 year
Description This cookie is set by Addthis.com to enable sharing of links on social media platforms like Facebook and Twitter

Cookie vc
Duration 1 year
Description This cookie is set by addthis.com on sites that allow sharing on social media.

Cookie __atuvc
Duration 1 year
Description This cookie is set by Addthis to make sure you see the updated count if you share a page and return to it before our share count cache is updated.

Cookie __atuvs
Duration 30 minutes
Description This cookie is set by Addthis to make sure you see the updated count if you share a page and return to it before our share count cache is updated.

Cookie ouid
Duration 1 year
Description The cookie is set by Addthis which enables the content of the website to be shared across different networking and social sharing websites.

Analytics

[x]

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.

Cookie _ga_*
Duration 1 year 1 month 4 days
Description Google Analytics sets this cookie to store and count page views.

Cookie _ga
Duration 2 years
Description This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, camapign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assigns a randoly generated number to identify unique visitors.

Cookie sbjs_migrations
Duration session
Description Sourcebuster sets this cookie to identify the source of a visit and stores user action information in cookies. This analytical and behavioural cookie is used to enhance the visitor experience on the website.

Cookie sbjs_current_add
Duration session
Description Sourcebuster sets this cookie to identify the source of a visit and stores user action information in cookies. This analytical and behavioural cookie is used to enhance the visitor experience on the website.

Cookie sbjs_first_add
Duration session
Description Sourcebuster sets this cookie to identify the source of a visit and stores user action information in cookies. This analytical and behavioural cookie is used to enhance the visitor experience on the website.

Cookie sbjs_current
Duration session
Description Sourcebuster sets this cookie to identify the source of a visit and stores user action information in cookies. This analytical and behavioural cookie is used to enhance the visitor experience on the website.

Cookie sbjs_first
Duration session
Description Sourcebuster sets this cookie to identify the source of a visit and stores user action information in cookies. This analytical and behavioural cookie is used to enhance the visitor experience on the website.

Cookie sbjs_udata
Duration session
Description Sourcebuster sets this cookie to identify the source of a visit and stores user action information in cookies. This analytical and behavioural cookie is used to enhance the visitor experience on the website.

Cookie sbjs_session
Duration 1 hour
Description Sourcebuster sets this cookie to identify the source of a visit and stores user action information in cookies. This analytical and behavioural cookie is used to enhance the visitor experience on the website.

Cookie tk_or
Duration 1 year 1 month 4 days
Description JetPack plugin sets this referral cookie on sites using WooCommerce, which analyzes referrer behaviour for Jetpack.

Cookie tk_r3d
Duration 3 days
Description JetPack installs this cookie to collect internal metrics for user activity and improve user experience.

Cookie tk_lr
Duration 1 year
Description JetPack plugin sets this referral cookie on sites using WooCommerce, which analyzes referrer behaviour for Jetpack.

Cookie tk_ai
Duration 1 year
Description JetPack sets this cookie to store a randomly-generated anonymous ID used only within the admin area and for general analytics tracking.

Cookie tk_tc
Duration session
Description JetPack sets this cookie to record details on how users use the website.

Cookie _gat_gtag_UA_5784146_31
Duration 1 minute
Description Google Used to distinguish users.

Cookie GPS
Duration 30 minutes
Description This cookie is set by Youtube and registers a unique ID for tracking users based on their geographical location

Cookie __gads
Duration 2 years
Description This cookie is set by Google and stored under the name dounleclick.com. This cookie is used to track how many times users see a particular advert which helps in measuring the success of the campaign and calculate the revenue generated by the campaign. These cookies can only be read from the domain that it is set on so it will not track any data while browsing through another sites.

Cookie uvc
Duration 1 year
Description The cookie is set by addthis.com to determine the usage of Addthis.com service.

Cookie ad-id
Duration 7 months
Description Provided by amazon-adsystem.com for tracking user actions on other websites to provide targeted content

Cookie _gat_gtag_UA_116563943_1
Duration 1 minute
Description Google uses this cookie to distinguish users.

Cookie _gid
Duration 1 day
Description This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form.

Performance

[x]

Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.

Cookie YSC
Duration
Description This cookies is set by Youtube and is used to track the views of embedded videos.

Cookie _gat
Duration 1 minute
Description This cookies is installed by Google Universal Analytics to throttle the request rate to limit the colllection of data on high traffic sites.

[x]

Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.

Cookie COMPASS
Duration 1 hour
Description The COMPASS cookie is used by Yahoo to deliver targeted advertising based on user's online behavior.

Cookie NID
Duration 5 months
Description This cookie is used to a profile based on user's interest and display personalized ads to the users.

Cookie __Secure-YNID
Duration 6 months
Description Google cookie used to protect user security and prevent fraud, especially during the login process.

Cookie __Secure-ROLLOUT_TOKEN
Duration 6 months
Description YouTube sets this cookie to manage feature rollout and experimentation. It helps Google control which new features or interface changes are shown to users as part of testing and staged rollouts, ensuring consistent experience for a given user during an experiment.

Cookie yt.innertube::nextId
Duration never
Description YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.

Cookie yt.innertube::requests
Duration never
Description YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.

Cookie VISITOR_INFO1_LIVE
Duration 5 months
Description This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.

Cookie TapAd_TS
Duration 1 month
Description The cookie is set by Tapad.com. The purpose of the cookie is to track users across devices to enable targeted advertising.

Cookie TapAd_DID
Duration 1 month
Description The cookie is set by tapad.com. The purpose of the cookie is to track users across devices to enable targeted advertising

Cookie personalization_id
Duration 2 years
Description This cookie is set by twitter.com. It is used integrate the sharing features of this social media. It also stores information about how the user uses the website for tracking and targeting.

Cookie uid
Duration 1 year
Description This cookie is used to measure the number and behavior of the visitors to the website anonymously. The data includes the number of visits, average duration of the visit on the website, pages visited, etc. for the purpose of better understanding user preferences for targeted advertisments.

Cookie loc
Duration 1 year
Description This cookie is set by Addthis. This is a geolocation cookie to understand where the users sharing the information are located.

Cookie IDE
Duration 2 years
Description Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.

Cookie di2
Duration 1 year
Description This cookie is set by addthis.com on sites that allows sharing on social media. The cookie is used to track user behavior anonymously to generate usage trends to improve relevance to their services and advertising.

Others

[x]

Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.

Cookie pxcts
Duration session
Description Description is currently not available.

Cookie _pxttld
Duration session
Description Description is currently not available.

Cookie SGPBShowingLimitationDomain77659
Duration 2 days
Description Description is currently not available.

Cookie __Secure-YEC
Duration past
Description YouTube sets this cookie to stores the user's video player preferences using embedded YouTube video

Cookie S
Duration 1 hour
Description Used by Yahoo to provide ads, content or analytics.

Cookie test_cookie
Duration 11 months
Description This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the users' browser supports cookies.

Cookie sc_at
Duration 1 year
Description Snapchat sets this cookie for showing relevant advertising based on the user’s movement.

Cookie TapAd_3WAY_SYNCS
Duration 1 month
Description TapAd sets this cookie for data synchronization with advertising networks.

Cookie _pin_unauth
Duration 1 year
Description Pinterest set this cookie to group actions for users who cannot be identified.

Cookie sc_anonymous_id
Duration 9 years
Description Soundcloud sets this cookie to enable visitors to embed content or files on the website.

Cookie um
Duration 1 year
Description Set by addthis.com.(Purpose not known)

Cookie DCRP_Categories
Duration 4 weeks
Description Description is currently not available.

Cookie vuid
Duration 2 years
Description Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos on the website.

Cookie X-AB
Duration 1 day
Description Adobe Analytics sets this cookie in context with multi-variate testing. This is a tool used to combine or change content on the website. This allows the website to find the best variation or edition of the site.

Cookie YTC
Duration 10 minutes
Description YouTube sets the YTC cookie to manage the embed and viewing of videos on the website.

Cookie sp_t
Duration 1 month
Description The sp_t cookie is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.

Cookie sp_landing
Duration 1 day
Description The sp_landing is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.

Cookie __asc
Duration 30 minutes
Description Alexa Metrics sets this cookie to track and report information to the Alexa analytics service.

Cookie __auc
Duration 1 year
Description Alexa Metrics sets this cookie to track and report information to the Alexa analytics service.

Cookie AWSESS
Duration
Description Awin sets this to ensure the same kind of advertisement is not shown to the user.

Cookie nevercache-b39818
Duration session
Description Description is currently not available.

REJECT Save My Preferences ACCEPT

Powered by ![Image 5: Cookieyes logo](https://www.cookieyes.com/product/cookie-consent/?ref=cypbcyb&utm_source=cookie-banner&utm_medium=powered-by-cookieyes)

![Image 6: LogoNews Hub](https://www.marktechpost.com/)

[Premium Content](https://www.marktechpost.com/2026/06/06/meet-harness-1-a-20b-retrieval-subagent-trained-with-reinforcement-learning-inside-a-stateful-search-harness-on-gpt-oss-20b# "Premium Content")

[Read our exclusive articles](https://www.marktechpost.com/2026/06/06/meet-harness-1-a-20b-retrieval-subagent-trained-with-reinforcement-learning-inside-a-stateful-search-harness-on-gpt-oss-20b# "Read our exclusive articles")

[Facebook](https://www.marktechpost.com/2026/06/06/meet-harness-1-a-20b-retrieval-subagent-trained-with-reinforcement-learning-inside-a-stateful-search-harness-on-gpt-oss-20b# "Facebook")

[Instagram](https://www.marktechpost.com/2026/06/06/meet-harness-1-a-20b-retrieval-subagent-trained-with-reinforcement-learning-inside-a-stateful-search-harness-on-gpt-oss-20b# "Instagram")

[X](https://www.marktechpost.com/2026/06/06/meet-harness-1-a-20b-retrieval-subagent-trained-with-reinforcement-learning-inside-a-stateful-search-harness-on-gpt-oss-20b# "X")

[Discord](https://pxl.to/ivxz41s "Discord")[Linkedin](https://www.linkedin.com/company/marktechpost/?viewAsMember=true "Linkedin")[Reddit](https://www.reddit.com/r/machinelearningnews/ "Reddit")[X](https://twitter.com/Marktechpost "X")

Search

![Image 7: LogoNews Hub](https://www.marktechpost.com/)

[](https://www.marktechpost.com/2026/06/06/meet-harness-1-a-20b-retrieval-subagent-trained-with-reinforcement-learning-inside-a-stateful-search-harness-on-gpt-oss-20b#)

![Image 8: LogoNews Hub](https://www.marktechpost.com/)

Search

Home[Editors Pick](https://www.marktechpost.com/category/editors-pick/ "View all posts in Editors Pick")[Agentic AI](https://www.marktechpost.com/category/editors-pick/agentic-ai/ "View all posts in Agentic AI")Meet Harness-1: A 20B Retrieval Subagent Trained With Reinforcement Learning Inside a...

tinyfish.ai Open Source Big**Set** Describe your ideal dataset in plain English, and BigSet builds it. dataset.build()auto·refresh ✓ ✓ ✓ ✓ Explore on GitHub→

Meet Harness-1: A 20B Retrieval Subagent Trained With Reinforcement Learning Inside a Stateful Search Harness on gpt-oss-20b

Harness-1 reaches 0.730 average curated recall across eight benchmarks, trailing only Opus-4.6 among the searchers tested.

By

Asif Razzaq

-

June 6, 2026

Most search agents are trained as policies over a growing transcript. The model decides how to search. It must also remember what it saw, which evidence matters, and which claims it checked. A team of researchers from University of Illinois Urbana-Champaign, UC Berkeley, and Chroma argues this asks too much. Reinforcement learning ends up optimizing both search decisions and routine bookkeeping at once.

Their answer is Harness-1, a 20B retrieval subagent built on gpt-oss-20b. It was trained with reinforcement learning inside a stateful search harness. The harness holds the bookkeeping. The policy keeps the semantic decisions. The weights and harness code are publicly released.

https://arxiv.org/pdf/2606.02373

What is Harness-1 Actually

Harness-1 produces a ranked set of documents for a downstream answering model. It does not answer questions itself. It runs inside a state-machine harness centered on a per-episode WORKINGMEMORY.

Each turn works as a loop. The harness renders compact search state along with recent actions. The model emits one structured action. The harness executes it, updates state, and renders the next observation.

The Stateful Harness: What Moves Out of the Policy

The research team calls its principle stateful cognitive offloading. The policy decides what to search, curate, and verify, and when to stop. The harness maintains the recoverable state around those decisions.

That state includes several pieces. A candidate pool holds compressed, deduplicated documents. An importance-tagged curated set is the final output, capped at 30 documents. Tags take four values: very_high, high, fair, or low. A full-text store keeps every retrieved chunk outside the prompt.

An evidence graph adds structure. A regex extractor scans each chunk for proper nouns, years, and dates. The harness then renders frequent entities, bridge documents, and singletons. Bridge documents contain two or more frequent entities. Singletons appear in one document and suggest follow-up leads.

The policy works through eight tools. These are fan_out_search, search_corpus, grep_corpus, read_document, review_docs, curate, verify, and end_search. Search outputs are compressed with sentence-BM25, keeping the top four sentences. Two-level deduplication removes repeats by chunk ID and content fingerprint.

One design choice addresses cold starts. The first successful search auto-seeds the curated set with eight reranked results at fair importance. The policy then promotes strong documents and removes weak ones. This turns the task from building from scratch into refinement.

The research team names three requirements for a trainable harness. These are warm-started curation, compact derived-state rendering, and diversity-preserving incentives. Harness-1 implements all three.

How It is Trained

Training splits along the same line as the harness. Supervised fine-tuning teaches the model to operate the interface. Reinforcement learning improves search decisions over the maintained state.

A single teacher, GPT-5.4, runs live inside the full harness. After filtering, 899 trajectories remain for SFT. The model uses LoRA at rank 32 for three epochs. The step-550 checkpoint initializes RL.

RL uses on-policy CISPO with a 40-turn cap and terminal-only reward. It trains only on SEC queries. Groups with identical rewards are dropped from the gradient. Training ran on Tinker.

The reward separates discovery from selection. It also adds a tool-diversity bonus. Without that bonus, the agent collapses to repeated search. Curated recall then plateaus near 0.53. With the bonus, diversity stabilizes and recall reaches about 0.60.

The Benchmark Case

Harness-1 was evaluated on eight benchmarks spanning web, finance, patents, and multi-hop QA. The main metric is curated recall: coverage of relevant documents in the final set. Trajectory recall counts evidence encountered anywhere in the episode.

| Model | Type | Avg Curated Recall | Avg Trajectory Recall | | --- | --- | --- | --- | | Harness-1 (20B) | Open small | 0.730 | 0.807 | | Tongyi DeepResearch 30B | Open small | 0.616 | 0.673 | | Context-1 (20B) | Open small | 0.603 | 0.756 | | Search-R1 (32B) | Open small | 0.289 | 0.289 | | GPT-OSS-20B | Open small | 0.262 | 0.590 | | Qwen3 (32B) | Open small | 0.216 | 0.446 | | Opus-4.6 | Frontier | 0.764 | 0.794 | | GPT-5.4 | Frontier | 0.709 | 0.752 | | Sonnet-4.6 | Frontier | 0.688 | 0.725 | | Kimi-K2.5 | Frontier | 0.647 | 0.794 | | GPT-OSS-120B | Frontier | 0.496 | 0.769 |

_Averages across eight benchmarks, from Figure 1 of the paper. Frontier models run as zero-shot retrievers under the Context-1 harness._

Harness-1 reaches 0.730 average curated recall. That beats the next open subagent, Tongyi DeepResearch 30B, by 11.4 points. Among the frontier searchers tested, only Opus-4.6 scores higher on average.

The transfer pattern is the clearest signal of the mechanism. SFT used four benchmark families; RL used only SEC. On those source-family tasks, Harness-1 gained 7.9 points over the closest open baseline. On four held-out benchmarks, it gained 17.0 points. That is a 2.2x larger gain on tasks furthest from training data.

Ablations support the harness claim. Disabling all harness mechanisms drops Recall by 12.2 percent relative on BrowseComp+. The trained policy keeps searching but cannot rank what it sees.

https://arxiv.org/pdf/2606.02373

Use Cases

The method targets evidence-seeking retrieval where documents support an answer. Several workflows fit this shape.

One is literature and patent review. The evidence graph and curated set help organize many sources. Another is financial-filing analysis. The SEC case study recovers an exact executive-transition date across multiple 8-Ks.

A third is multi-hop fact-checking. The fan_out_search and verify tools resolve ambiguous entities before committing. A fourth is modular RAG. The curated set feeds a frozen generator, and better sets yield higher answer accuracy.

Strengths and Weaknesses

#### Strengths

Highest average curated recall among the open models tested, and behind only Opus-4.6 overall.
Gains hold on held-out benchmarks, suggesting domain-general search operations.
Trained on 4,352 unique items, far fewer than several baselines.
Open checkpoint and harness code, servable with common runtimes.

#### Weaknesses

The evidence graph uses regex extraction, not full entity linking.
The verify tool is an LLM proxy that can err on ambiguous claims.
Sentence-BM25 compression may drop context tied to discourse structure.
The research team reports point estimates without full confidence intervals.

Key Takeaways

Harness-1 is a 20B search agent that moves search bookkeeping into the environment, leaving semantic decisions to the policy.
It hits 0.730 average curated recall across eight benchmarks, beating the next open subagent by 11.4 points.
Among the searchers tested, only Opus-4.6 scores higher on average curated recall.
Gains are largest on held-out benchmarks (+17.0 vs +7.9 points), suggesting the learned search operations transfer.
Weights and harness code are public, servable via vLLM, SGLang, or Transformers.

Marktechpost’s Visual Explainer

Stateful Search Agents 1 / 7

Research Guide

Harness-1: a 20B search agent with a stateful harness

A retrieval subagent trained with reinforcement learning inside a search harness that holds the bookkeeping.

20B · gpt-oss-20b base UIUC · UC Berkeley · Chroma arXiv:2606.02373 Open weights & code

The Core Idea

Split the work between policy and harness

Most search agents pack search decisions and routine bookkeeping into one growing transcript. Harness-1 separates the two. The paper calls this stateful cognitive offloading.

Policy decides

What to search
Which documents to keep
What claims to verify
When to stop

Harness maintains

Candidate pool
Curated evidence
Verification records
Context budget

Inside the Harness

Environment-side working memory

Candidate pool— compressed, deduplicated documents
Curated set— importance-tagged, capped at 30 (very_high / high / fair / low)
Evidence graph— entities, bridges, and singletons via regex extraction
Verification cache— claim to document to yes/no verdict
Full-text store— every retrieved chunk kept outside the prompt
Compression— sentence-BM25 keeps the top four sentences

Policy Actions

Eight tools edit the state

fan_out_search

search_corpus

grep_corpus

read_document

review_docs

curate

verify

end_search

The first successful search auto-seeds the curated set with eight reranked documents at fair importance. The policy then promotes strong documents and removes weak ones.

Training

SFT to operate the interface, RL to search

SFT: GPT-5.4 teacher inside the harness · 899 trajectories · LoRA rank 32 · step-550 checkpoint

RL: on-policy CISPO · SEC queries only · 40-turn cap · terminal reward · trained on Tinker

Data scale: 4,352 unique training items (899 SFT + 3,453 RL)

Three trainability requirements: warm-started curation, compact derived-state rendering, and diversity-preserving incentives.

Results

What the numbers show

0.730 average curated recall

across eight benchmarks

+11.4 pts over the next open subagent, Tongyi DeepResearch 30B

Among the searchers tested, only Opus-4.6 scores higher on average

Transfer: +17.0 on held-out vs +7.9 on source-family (2.2x gap)

Ablation: removing all harness mechanisms drops Recall 12.2% relative

Get Started

Run it yourself

Serve: vLLM, SGLang, or Transformers

Checkpoint: pat-jj/harness-1 (Hugging Face, 21B params, BF16)

Code: github.com/pat-jj/harness-1

Paper: arXiv:2606.02373

Harness-1 returns a curated set of documents for a downstream answering model. It does not answer questions itself.

← Prev Next →

Curated by Marktechpost — practitioner-first AI/ML research, news, and dev tooling for engineers.

* *

Check out the[Paper](https://arxiv.org/pdf/2606.02373),**Model weights**and[GitHub Repo](https://github.com/pat-jj/harness-1).Also,feel free to follow us on[Twitter](https://x.com/intent/follow?screen_name=marktechpost)and don’t forget to join our[150k+ ML SubReddit](https://www.reddit.com/r/machinelearningnews/)and Subscribe to[our Newsletter](https://www.aidevsignals.com/). Wait! are you on telegram?[now you can join us on telegram as well.](https://t.me/machinelearningresearchnews)

Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.?[Connect with us](https://forms.gle/wbash1wF6efRj8G58)

Previous articleNVIDIA garak Tutorial: Build a Complete Defensive LLM Red-Teaming Workflow with Custom Probes and Detectors

Asif Razzaq

#### RELATED ARTICLES MORE FROM AUTHOR

![Image 14: NVIDIA garak Tutorial: Build a Complete Defensive LLM Red-Teaming Workflow with Custom Probes and Detectors](https://www.marktechpost.com/2026/06/06/nvidia-garak-tutorial-build-a-complete-defensive-llm-red-teaming-workflow-with-custom-probes-and-detectors/ "NVIDIA garak Tutorial: Build a Complete Defensive LLM Red-Teaming Workflow with Custom Probes and Detectors")

[NVIDIA garak Tutorial: Build a Complete Defensive LLM Red-Teaming Workflow with Custom Probes and Detectors](https://www.marktechpost.com/2026/06/06/nvidia-garak-tutorial-build-a-complete-defensive-llm-red-teaming-workflow-with-custom-probes-and-detectors/ "NVIDIA garak Tutorial: Build a Complete Defensive LLM Red-Teaming Workflow with Custom Probes and Detectors")

![Image 15: Google's New Colab CLI Lets Developers and AI Agents Run Python on Remote Colab GPUs and TPUs From the Terminal](https://www.marktechpost.com/2026/06/06/googles-new-colab-cli-lets-developers-and-ai-agents-run-python-on-remote-colab-gpus-and-tpus-from-the-terminal/ "Google’s New Colab CLI Lets Developers and AI Agents Run Python on Remote Colab GPUs and TPUs From the Terminal")

[Google’s New Colab CLI Lets Developers and AI Agents Run Python on Remote Colab GPUs and TPUs From the Terminal](https://www.marktechpost.com/2026/06/06/googles-new-colab-cli-lets-developers-and-ai-agents-run-python-on-remote-colab-gpus-and-tpus-from-the-terminal/ "Google’s New Colab CLI Lets Developers and AI Agents Run Python on Remote Colab GPUs and TPUs From the Terminal")

![Image 16: Moonshot AI Releases Kimi Code CLI: A Terminal AI Coding Agent Built in TypeScript for Next-Gen Agents](https://www.marktechpost.com/2026/06/06/moonshot-ai-releases-kimi-code-cli-a-terminal-ai-coding-agent-built-in-typescript-for-next-gen-agents/ "Moonshot AI Releases Kimi Code CLI: A Terminal AI Coding Agent Built in TypeScript for Next-Gen Agents")

[Moonshot AI Releases Kimi Code CLI: A Terminal AI Coding Agent Built in TypeScript for Next-Gen Agents](https://www.marktechpost.com/2026/06/06/moonshot-ai-releases-kimi-code-cli-a-terminal-ai-coding-agent-built-in-typescript-for-next-gen-agents/ "Moonshot AI Releases Kimi Code CLI: A Terminal AI Coding Agent Built in TypeScript for Next-Gen Agents")

![Image 17: NVIDIA Releases Nemotron 3.5 ASR: A 600M-Parameter Cache-Aware Streaming Model Transcribing 40 Language-Locales in Real Time](https://www.marktechpost.com/2026/06/06/nvidia-releases-nemotron-3-5-asr-a-600m-parameter-cache-aware-streaming-model-transcribing-40-language-locales-in-real-time/ "NVIDIA Releases Nemotron 3.5 ASR: A 600M-Parameter Cache-Aware Streaming Model Transcribing 40 Language-Locales in Real Time")

[NVIDIA Releases Nemotron 3.5 ASR: A 600M-Parameter Cache-Aware Streaming Model Transcribing 40 Language-Locales in Real Time](https://www.marktechpost.com/2026/06/06/nvidia-releases-nemotron-3-5-asr-a-600m-parameter-cache-aware-streaming-model-transcribing-40-language-locales-in-real-time/ "NVIDIA Releases Nemotron 3.5 ASR: A 600M-Parameter Cache-Aware Streaming Model Transcribing 40 Language-Locales in Real Time")

![Image 18: A Hands-On Coding Tutorial on Qualcomm AI Hub Models for Classification, Object Detection, and Hardware-Aware Deployment](https://www.marktechpost.com/2026/06/05/a-hands-on-coding-tutorial-on-qualcomm-ai-hub-models-for-classification-object-detection-and-hardware-aware-deployment/ "A Hands-On Coding Tutorial on Qualcomm AI Hub Models for Classification, Object Detection, and Hardware-Aware Deployment")

[A Hands-On Coding Tutorial on Qualcomm AI Hub Models for Classification, Object Detection, and Hardware-Aware Deployment](https://www.marktechpost.com/2026/06/05/a-hands-on-coding-tutorial-on-qualcomm-ai-hub-models-for-classification-object-detection-and-hardware-aware-deployment/ "A Hands-On Coding Tutorial on Qualcomm AI Hub Models for Classification, Object Detection, and Hardware-Aware Deployment")

![Image 19: Google DeepMind Releases Gemma 4 QAT Checkpoints: Q4_0 and a New Mobile Format Cut On-Device Memory](https://www.marktechpost.com/2026/06/05/google-deepmind-releases-gemma-4-qat-checkpoints-q4_0-and-a-new-mobile-format-cut-on-device-memory/ "Google DeepMind Releases Gemma 4 QAT Checkpoints: Q4_0 and a New Mobile Format Cut On-Device Memory")

[Google DeepMind Releases Gemma 4 QAT Checkpoints: Q4_0 and a New Mobile Format Cut On-Device Memory](https://www.marktechpost.com/2026/06/05/google-deepmind-releases-gemma-4-qat-checkpoints-q4_0-and-a-new-mobile-format-cut-on-device-memory/ "Google DeepMind Releases Gemma 4 QAT Checkpoints: Q4_0 and a New Mobile Format Cut On-Device Memory")

[](https://www.marktechpost.com/2026/06/06/meet-harness-1-a-20b-retrieval-subagent-trained-with-reinforcement-learning-inside-a-stateful-search-harness-on-gpt-oss-20b#)[](https://www.marktechpost.com/2026/06/06/meet-harness-1-a-20b-retrieval-subagent-trained-with-reinforcement-learning-inside-a-stateful-search-harness-on-gpt-oss-20b#)

![Image 22: NVIDIA garak Tutorial: Build a Complete Defensive LLM Red-Teaming Workflow with Custom Probes and Detectors](https://www.marktechpost.com/2026/06/06/nvidia-garak-tutorial-build-a-complete-defensive-llm-red-teaming-workflow-with-custom-probes-and-detectors/ "NVIDIA garak Tutorial: Build a Complete Defensive LLM Red-Teaming Workflow with Custom Probes and Detectors")

[NVIDIA garak Tutorial: Build a Complete Defensive LLM Red-Teaming Workflow with Custom Probes and...](https://www.marktechpost.com/2026/06/06/nvidia-garak-tutorial-build-a-complete-defensive-llm-red-teaming-workflow-with-custom-probes-and-detectors/ "NVIDIA garak Tutorial: Build a Complete Defensive LLM Red-Teaming Workflow with Custom Probes and Detectors")

Sana Hassan-June 6, 20260

This tutorial walks through NVIDIA garak as an end-to-end framework for defensive LLM red-teaming. It covers setup, plugin discovery, dry runs, real-model scans on a Hugging Face generator, and multi-probe evaluations. The workflow then analyzes safety scores and attack success rates, inspects flagged outputs, and extends garak with a custom probe and detector. It closes by exporting results in AVID format for structured vulnerability

![Image 23: Google's New Colab CLI Lets Developers and AI Agents Run Python on Remote Colab GPUs and TPUs From the Terminal](https://www.marktechpost.com/2026/06/06/googles-new-colab-cli-lets-developers-and-ai-agents-run-python-on-remote-colab-gpus-and-tpus-from-the-terminal/ "Google’s New Colab CLI Lets Developers and AI Agents Run Python on Remote Colab GPUs and TPUs From the Terminal")

[Google’s New Colab CLI Lets Developers and AI Agents Run Python on Remote Colab...](https://www.marktechpost.com/2026/06/06/googles-new-colab-cli-lets-developers-and-ai-agents-run-python-on-remote-colab-gpus-and-tpus-from-the-terminal/ "Google’s New Colab CLI Lets Developers and AI Agents Run Python on Remote Colab GPUs and TPUs From the Terminal")

Asif Razzaq-June 6, 20260

Google released the Colab CLI, letting developers and AI agents run local code on remote Colab GPU and TPU runtime

![Image 24: Moonshot AI Releases Kimi Code CLI: A Terminal AI Coding Agent Built in TypeScript for Next-Gen Agents](https://www.marktechpost.com/2026/06/06/moonshot-ai-releases-kimi-code-cli-a-terminal-ai-coding-agent-built-in-typescript-for-next-gen-agents/ "Moonshot AI Releases Kimi Code CLI: A Terminal AI Coding Agent Built in TypeScript for Next-Gen Agents")

[Moonshot AI Releases Kimi Code CLI: A Terminal AI Coding Agent Built in TypeScript...](https://www.marktechpost.com/2026/06/06/moonshot-ai-releases-kimi-code-cli-a-terminal-ai-coding-agent-built-in-typescript-for-next-gen-agents/ "Moonshot AI Releases Kimi Code CLI: A Terminal AI Coding Agent Built in TypeScript for Next-Gen Agents")

Michal Sutter-June 6, 20260

Kimi Code CLI is Moonshot AI's open-source terminal coding agent, written in TypeScript with subagents and MCP configuration.

![Image 25: NVIDIA Releases Nemotron 3.5 ASR: A 600M-Parameter Cache-Aware Streaming Model Transcribing 40 Language-Locales in Real Time](https://www.marktechpost.com/2026/06/06/nvidia-releases-nemotron-3-5-asr-a-600m-parameter-cache-aware-streaming-model-transcribing-40-language-locales-in-real-time/ "NVIDIA Releases Nemotron 3.5 ASR: A 600M-Parameter Cache-Aware Streaming Model Transcribing 40 Language-Locales in Real Time")

[NVIDIA Releases Nemotron 3.5 ASR: A 600M-Parameter Cache-Aware Streaming Model Transcribing 40 Language-Locales in...](https://www.marktechpost.com/2026/06/06/nvidia-releases-nemotron-3-5-asr-a-600m-parameter-cache-aware-streaming-model-transcribing-40-language-locales-in-real-time/ "NVIDIA Releases Nemotron 3.5 ASR: A 600M-Parameter Cache-Aware Streaming Model Transcribing 40 Language-Locales in Real Time")

Asif Razzaq-June 6, 20260

NVIDIA released Nemotron 3.5 ASR, a cache-aware 600M streaming model transcribing 40 language-locales in real time from one checkpoint.

![Image 26: A Hands-On Coding Tutorial on Qualcomm AI Hub Models for Classification, Object Detection, and Hardware-Aware Deployment](https://www.marktechpost.com/2026/06/05/a-hands-on-coding-tutorial-on-qualcomm-ai-hub-models-for-classification-object-detection-and-hardware-aware-deployment/ "A Hands-On Coding Tutorial on Qualcomm AI Hub Models for Classification, Object Detection, and Hardware-Aware Deployment")

[A Hands-On Coding Tutorial on Qualcomm AI Hub Models for Classification, Object Detection, and...](https://www.marktechpost.com/2026/06/05/a-hands-on-coding-tutorial-on-qualcomm-ai-hub-models-for-classification-object-detection-and-hardware-aware-deployment/ "A Hands-On Coding Tutorial on Qualcomm AI Hub Models for Classification, Object Detection, and Hardware-Aware Deployment")

Sana Hassan-June 5, 20260

Set up Qualcomm AI Hub Models to run MobileNet-V2 inference, YOLOv7 detection, and compile models on real devices.

![Image 27: Google DeepMind Releases Gemma 4 QAT Checkpoints: Q4_0 and a New Mobile Format Cut On-Device Memory](https://www.marktechpost.com/2026/06/05/google-deepmind-releases-gemma-4-qat-checkpoints-q4_0-and-a-new-mobile-format-cut-on-device-memory/ "Google DeepMind Releases Gemma 4 QAT Checkpoints: Q4_0 and a New Mobile Format Cut On-Device Memory")

[Google DeepMind Releases Gemma 4 QAT Checkpoints: Q4_0 and a New Mobile Format Cut...](https://www.marktechpost.com/2026/06/05/google-deepmind-releases-gemma-4-qat-checkpoints-q4_0-and-a-new-mobile-format-cut-on-device-memory/ "Google DeepMind Releases Gemma 4 QAT Checkpoints: Q4_0 and a New Mobile Format Cut On-Device Memory")

Asif Razzaq-June 5, 20260

Compare Gemma 4 edge formats: BF16, Q4_0 QAT, and mobile QAT, on published memory numbers and design tradeoffs.

![Image 28: NVIDIA AI Releases Dynamo Snapshot: A CRIU-Based Fast Startup System for AI Inference on Kubernetes](https://www.marktechpost.com/2026/06/05/nvidia-ai-releases-dynamo-snapshot-a-criu-based-fast-startup-system-for-ai-inference-on-kubernetes/ "NVIDIA AI Releases Dynamo Snapshot: A CRIU-Based Fast Startup System for AI Inference on Kubernetes")

[NVIDIA AI Releases Dynamo Snapshot: A CRIU-Based Fast Startup System for AI Inference on...](https://www.marktechpost.com/2026/06/05/nvidia-ai-releases-dynamo-snapshot-a-criu-based-fast-startup-system-for-ai-inference-on-kubernetes/ "NVIDIA AI Releases Dynamo Snapshot: A CRIU-Based Fast Startup System for AI Inference on Kubernetes")

Asif Razzaq-June 5, 20260

NVIDIA Dynamo Snapshot checkpoints and restores vLLM inference workers on Kubernetes using CRIU and cuda-checkpoint tools.

![Image 29: Perplexity AI Introduces Hybrid Local-Server Inference Orchestrator for Personal Computer: Automatic On-Device and Cloud Task Routing](https://www.marktechpost.com/2026/06/05/perplexity-ai-introduces-hybrid-local-server-inference-orchestrator-for-personal-computer-automatic-on-device-and-cloud-task-routing/ "Perplexity AI Introduces Hybrid Local-Server Inference Orchestrator for Personal Computer: Automatic On-Device and Cloud Task Routing")

[Perplexity AI Introduces Hybrid Local-Server Inference Orchestrator for Personal Computer: Automatic On-Device and Cloud...](https://www.marktechpost.com/2026/06/05/perplexity-ai-introduces-hybrid-local-server-inference-orchestrator-for-personal-computer-automatic-on-device-and-cloud-task-routing/ "Perplexity AI Introduces Hybrid Local-Server Inference Orchestrator for Personal Computer: Automatic On-Device and Cloud Task Routing")

Michal Sutter-June 5, 20260

Perplexity AI announces a hybrid local-server inference orchestrator for Personal Computer, automatically routing AI tasks between on-device and cloud models.

![Image 30: Microsoft Fara Tutorial: Run a Browser-Use Agent in Google Colab with a Mock OpenAI-Compatible Endpoint](https://www.marktechpost.com/2026/06/05/microsoft-fara-tutorial-run-a-browser-use-agent-in-google-colab-with-a-mock-openai-compatible-endpoint/ "Microsoft Fara Tutorial: Run a Browser-Use Agent in Google Colab with a Mock OpenAI-Compatible Endpoint")

[Microsoft Fara Tutorial: Run a Browser-Use Agent in Google Colab with a Mock OpenAI-Compatible...](https://www.marktechpost.com/2026/06/05/microsoft-fara-tutorial-run-a-browser-use-agent-in-google-colab-with-a-mock-openai-compatible-endpoint/ "Microsoft Fara Tutorial: Run a Browser-Use Agent in Google Colab with a Mock OpenAI-Compatible Endpoint")

Sana Hassan-June 5, 20260

A hands-on guide to running Microsoft Fara in Colab, testing the browser agent loop with a mock endpoint.

![Image 31: 15 Best Vibe Coding Tools in 2026 Compared: Pricing, Features, and Best Fit](https://www.marktechpost.com/2026/06/05/15-best-vibe-coding-tools-in-2026-compared-pricing-features-and-best-fit/ "15 Best Vibe Coding Tools in 2026 Compared: Pricing, Features, and Best Fit")

[15 Best Vibe Coding Tools in 2026 Compared: Pricing, Features, and Best Fit](https://www.marktechpost.com/2026/06/05/15-best-vibe-coding-tools-in-2026-compared-pricing-features-and-best-fit/ "15 Best Vibe Coding Tools in 2026 Compared: Pricing, Features, and Best Fit")

Asif Razzaq-June 5, 20260

Vibe coding turns plain language into working software. Explore 15 tools shaping how developers build apps in 2026.

tinyfish.ai Open Source Big**Set** Describe your ideal dataset in plain English, and BigSet builds it. dataset.build()auto·refresh ✓ ✓ ✓ ✓ Explore on GitHub→

[Discord](https://pxl.to/ivxz41s "Discord")[Linkedin](https://www.linkedin.com/company/marktechpost/?viewAsMember=true "Linkedin")[Reddit](https://www.reddit.com/r/machinelearningnews/ "Reddit")[X](https://twitter.com/Marktechpost "X")

[](https://www.marktechpost.com/2026/06/06/meet-harness-1-a-20b-retrieval-subagent-trained-with-reinforcement-learning-inside-a-stateful-search-harness-on-gpt-oss-20b#)[](https://www.marktechpost.com/2026/06/06/meet-harness-1-a-20b-retrieval-subagent-trained-with-reinforcement-learning-inside-a-stateful-search-harness-on-gpt-oss-20b#)

Loading Comments...

Write a Comment...

Email (Required) Name (Required) Website

[](https://www.marktechpost.com/2026/06/06/meet-harness-1-a-20b-retrieval-subagent-trained-with-reinforcement-learning-inside-a-stateful-search-harness-on-gpt-oss-20b#)

Harness-1：基于强化学习训练的有状态搜索20B检索子智能体

TL;DR · AI 摘要

核心要点

思维导图

Meet Harness-1: A 20B Retrieval Subagent Trained With Reinforcement Learning Inside a Stateful Search Harness on gpt-oss-20b - MarkTechPost

Meet Harness-1: A 20B Retrieval Subagent Trained With Reinforcement Learning Inside a Stateful Search Harness on gpt-oss-20b

**What is Harness-1 Actually**

**The Stateful Harness: What Moves Out of the Policy**

**How It is Trained**

**The Benchmark Case**

**Use Cases**

**Strengths and Weaknesses**

**Key Takeaways**

**Marktechpost’s Visual Explainer**

Harness-1: a 20B search agent with a stateful harness

Split the work between policy and harness

Environment-side working memory

Eight tools edit the state

SFT to operate the interface, RL to search

What the numbers show

Run it yourself

[15 Best Vibe Coding Tools in 2026 Compared: Pricing, Features, and Best Fit](https://www.marktechpost.com/2026/06/05/15-best-vibe-coding-tools-in-2026-compared-pricing-features-and-best-fit/ "15 Best Vibe Coding Tools in 2026 Compared: Pricing, Features, and Best Fit")

What is Harness-1 Actually

The Stateful Harness: What Moves Out of the Policy

How It is Trained

The Benchmark Case

Use Cases

Strengths and Weaknesses

Key Takeaways

Marktechpost’s Visual Explainer