Harness-1:基于强化学习训练的有状态搜索20B检索子智能体
TL;DR · AI 摘要
文章主要列举了网站使用的各种 Cookie,缺乏技术深度与实用价值。
核心要点
- 列出了 30+ 种 Cookie 名称与用途。
- Cookie 分为必要、功能、分析等类别。
- 多数 Cookie 与安全、反作弊相关。
思维导图
用一张图看清主题之间的关系。
查看大纲文本(无障碍 / 无 JS 友好)
- Cookie 列表
Meet Harness-1: A 20B Retrieval Subagent Trained With Reinforcement Learning Inside a Stateful Search Harness on gpt-oss-20b - MarkTechPost
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies. .
Necessary Always Active
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
- Cookie __cf_bm
- Duration 1 hour
- Description This cookie, set by Cloudflare, is used to support Cloudflare Bot Management.
- Cookie _pxvid
- Duration 1 year
- Description PerimeterX sets this cookie to detect fraud and bot activity.
- Cookie _px3
- Duration 6 minutes
- Description This cookie is set by the Bloomberg to protect the site from BOT attacks.
- Cookie CookieLawInfoConsent
- Duration 1 year
- Description CookieYes sets this cookie to record the default button state of the corresponding category and the status of CCPA. It works only in coordination with the primary cookie.
- Cookie cookielawinfo-checkbox-necessary
- Duration 11 months
- Description This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
- Cookie cookielawinfo-checkbox-others
- Duration 1 year
- Description Set by the GDPR Cookie Consent plugin, this cookie stores user consent for cookies in the category "Others".
- Cookie cookielawinfo-checkbox-non-necessary
- Duration 11 months
- Description This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Non Necessary".
- Cookie cookielawinfo-checkbox-analytics
- Duration 1 year
- Description Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Analytics" category.
- Cookie cookielawinfo-checkbox-performance
- Duration 1 year
- Description Set by the GDPR Cookie Consent plugin, this cookie stores the user consent for cookies in the category "Performance".
- Cookie cookielawinfo-checkbox-uncategorized
- Duration 1 year
- Description The cookie is set by the GDPR Cookie Consent plugin to record the user consent for cookies in the category "Uncategorized".
- Cookie cookielawinfo-checkbox-functional
- Duration 1 year
- Description The GDPR Cookie Consent plugin sets the cookie to record the user consent for the cookies in the category "Functional".
- Cookie cookielawinfo-checkbox-advertisement
- Duration 1 year
- Description Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Advertisement" category.
- Cookie wpEmojiSettingsSupports
- Duration session
- Description WordPress sets this cookie when a user interacts with emojis on a WordPress site. It helps determine if the user's browser can display emojis properly.
- Cookie VISITOR_PRIVACY_METADATA
- Duration 6 months
- Description YouTube sets this cookie to store the user's cookie consent state for the current domain.
- Cookie viewed_cookie_policy
- Duration 11 months
- Description The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
- Cookie PHPSESSID
- Duration
- Description This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
- Cookie __cfduid
- Duration 4 weeks
- Description The cookie is set by CloudFare. The cookie is used to identify individual clients behind a shared IP address d apply security settings on a per-client basis. It doesnot correspond to any user ID in the web application and does not store any personally identifiable information.
Functional
- [x]
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
- Cookie yt-remote-connected-devices
- Duration never
- Description YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
- Cookie ytidb::LAST_RESULT_ENTRY_KEY
- Duration never
- Description The cookie ytidb::LAST_RESULT_ENTRY_KEY is used by YouTube to store the last search result entry that was clicked by the user. This information is used to improve the user experience by providing more relevant search results in the future.
- Cookie yt-remote-device-id
- Duration never
- Description YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
- Cookie yt-remote-session-name
- Duration session
- Description The yt-remote-session-name cookie is used by YouTube to store the user's video player preferences using embedded YouTube video.
- Cookie yt-remote-fast-check-period
- Duration session
- Description The yt-remote-fast-check-period cookie is used by YouTube to store the user's video player preferences for embedded YouTube videos.
- Cookie yt-remote-session-app
- Duration session
- Description The yt-remote-session-app cookie is used by YouTube to store user preferences and information about the interface of the embedded YouTube video player.
- Cookie yt-remote-cast-available
- Duration session
- Description The yt-remote-cast-available cookie is used to store the user's preferences regarding whether casting is available on their YouTube video player.
- Cookie yt-remote-cast-installed
- Duration session
- Description The yt-remote-cast-installed cookie is used to store the user's video player preferences using embedded YouTube video.
- Cookie na_id
- Duration 1 year
- Description This cookie is set by Addthis.com to enable sharing of links on social media platforms like Facebook and Twitter
- Cookie vc
- Duration 1 year
- Description This cookie is set by addthis.com on sites that allow sharing on social media.
- Cookie __atuvc
- Duration 1 year
- Description This cookie is set by Addthis to make sure you see the updated count if you share a page and return to it before our share count cache is updated.
- Cookie __atuvs
- Duration 30 minutes
- Description This cookie is set by Addthis to make sure you see the updated count if you share a page and return to it before our share count cache is updated.
- Cookie ouid
- Duration 1 year
- Description The cookie is set by Addthis which enables the content of the website to be shared across different networking and social sharing websites.
Analytics
- [x]
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
- Cookie _ga_*
- Duration 1 year 1 month 4 days
- Description Google Analytics sets this cookie to store and count page views.
- Cookie _ga
- Duration 2 years
- Description This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, camapign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assigns a randoly generated number to identify unique visitors.
- Cookie sbjs_migrations
- Duration session
- Description Sourcebuster sets this cookie to identify the source of a visit and stores user action information in cookies. This analytical and behavioural cookie is used to enhance the visitor experience on the website.
- Cookie sbjs_current_add
- Duration session
- Description Sourcebuster sets this cookie to identify the source of a visit and stores user action information in cookies. This analytical and behavioural cookie is used to enhance the visitor experience on the website.
- Cookie sbjs_first_add
- Duration session
- Description Sourcebuster sets this cookie to identify the source of a visit and stores user action information in cookies. This analytical and behavioural cookie is used to enhance the visitor experience on the website.
- Cookie sbjs_current
- Duration session
- Description Sourcebuster sets this cookie to identify the source of a visit and stores user action information in cookies. This analytical and behavioural cookie is used to enhance the visitor experience on the website.
- Cookie sbjs_first
- Duration session
- Description Sourcebuster sets this cookie to identify the source of a visit and stores user action information in cookies. This analytical and behavioural cookie is used to enhance the visitor experience on the website.
- Cookie sbjs_udata
- Duration session
- Description Sourcebuster sets this cookie to identify the source of a visit and stores user action information in cookies. This analytical and behavioural cookie is used to enhance the visitor experience on the website.
- Cookie sbjs_session
- Duration 1 hour
- Description Sourcebuster sets this cookie to identify the source of a visit and stores user action information in cookies. This analytical and behavioural cookie is used to enhance the visitor experience on the website.
- Cookie tk_or
- Duration 1 year 1 month 4 days
- Description JetPack plugin sets this referral cookie on sites using WooCommerce, which analyzes referrer behaviour for Jetpack.
- Cookie tk_r3d
- Duration 3 days
- Description JetPack installs this cookie to collect internal metrics for user activity and improve user experience.
- Cookie tk_lr
- Duration 1 year
- Description JetPack plugin sets this referral cookie on sites using WooCommerce, which analyzes referrer behaviour for Jetpack.
- Cookie tk_ai
- Duration 1 year
- Description JetPack sets this cookie to store a randomly-generated anonymous ID used only within the admin area and for general analytics tracking.
- Cookie tk_tc
- Duration session
- Description JetPack sets this cookie to record details on how users use the website.
- Cookie _gat_gtag_UA_5784146_31
- Duration 1 minute
- Description Google Used to distinguish users.
- Cookie GPS
- Duration 30 minutes
- Description This cookie is set by Youtube and registers a unique ID for tracking users based on their geographical location
- Cookie __gads
- Duration 2 years
- Description This cookie is set by Google and stored under the name dounleclick.com. This cookie is used to track how many times users see a particular advert which helps in measuring the success of the campaign and calculate the revenue generated by the campaign. These cookies can only be read from the domain that it is set on so it will not track any data while browsing through another sites.
- Cookie uvc
- Duration 1 year
- Description The cookie is set by addthis.com to determine the usage of Addthis.com service.
- Cookie ad-id
- Duration 7 months
- Description Provided by amazon-adsystem.com for tracking user actions on other websites to provide targeted content
- Cookie _gat_gtag_UA_116563943_1
- Duration 1 minute
- Description Google uses this cookie to distinguish users.
- Cookie _gid
- Duration 1 day
- Description This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the wbsite is doing. The data collected including the number visitors, the source where they have come from, and the pages viisted in an anonymous form.
Performance
- [x]
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
- Cookie YSC
- Duration
- Description This cookies is set by Youtube and is used to track the views of embedded videos.
- Cookie _gat
- Duration 1 minute
- Description This cookies is installed by Google Universal Analytics to throttle the request rate to limit the colllection of data on high traffic sites.
Advertisement
- [x]
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
- Cookie COMPASS
- Duration 1 hour
- Description The COMPASS cookie is used by Yahoo to deliver targeted advertising based on user's online behavior.
- Cookie NID
- Duration 5 months
- Description This cookie is used to a profile based on user's interest and display personalized ads to the users.
- Cookie __Secure-YNID
- Duration 6 months
- Description Google cookie used to protect user security and prevent fraud, especially during the login process.
- Cookie __Secure-ROLLOUT_TOKEN
- Duration 6 months
- Description YouTube sets this cookie to manage feature rollout and experimentation. It helps Google control which new features or interface changes are shown to users as part of testing and staged rollouts, ensuring consistent experience for a given user during an experiment.
- Cookie yt.innertube::nextId
- Duration never
- Description YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.
- Cookie yt.innertube::requests
- Duration never
- Description YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.
- Cookie VISITOR_INFO1_LIVE
- Duration 5 months
- Description This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.
- Cookie TapAd_TS
- Duration 1 month
- Description The cookie is set by Tapad.com. The purpose of the cookie is to track users across devices to enable targeted advertising.
- Cookie TapAd_DID
- Duration 1 month
- Description The cookie is set by tapad.com. The purpose of the cookie is to track users across devices to enable targeted advertising
- Cookie personalization_id
- Duration 2 years
- Description This cookie is set by twitter.com. It is used integrate the sharing features of this social media. It also stores information about how the user uses the website for tracking and targeting.
- Cookie uid
- Duration 1 year
- Description This cookie is used to measure the number and behavior of the visitors to the website anonymously. The data includes the number of visits, average duration of the visit on the website, pages visited, etc. for the purpose of better understanding user preferences for targeted advertisments.
- Cookie loc
- Duration 1 year
- Description This cookie is set by Addthis. This is a geolocation cookie to understand where the users sharing the information are located.
- Cookie IDE
- Duration 2 years
- Description Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
- Cookie di2
- Duration 1 year
- Description This cookie is set by addthis.com on sites that allows sharing on social media. The cookie is used to track user behavior anonymously to generate usage trends to improve relevance to their services and advertising.
Others
- [x]
Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.
- Cookie pxcts
- Duration session
- Description Description is currently not available.
- Cookie _pxttld
- Duration session
- Description Description is currently not available.
- Cookie SGPBShowingLimitationDomain77659
- Duration 2 days
- Description Description is currently not available.
- Cookie __Secure-YEC
- Duration past
- Description YouTube sets this cookie to stores the user's video player preferences using embedded YouTube video
- Cookie S
- Duration 1 hour
- Description Used by Yahoo to provide ads, content or analytics.
- Cookie test_cookie
- Duration 11 months
- Description This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the users' browser supports cookies.
- Cookie sc_at
- Duration 1 year
- Description Snapchat sets this cookie for showing relevant advertising based on the user’s movement.
- Cookie TapAd_3WAY_SYNCS
- Duration 1 month
- Description TapAd sets this cookie for data synchronization with advertising networks.
- Cookie _pin_unauth
- Duration 1 year
- Description Pinterest set this cookie to group actions for users who cannot be identified.
- Cookie sc_anonymous_id
- Duration 9 years
- Description Soundcloud sets this cookie to enable visitors to embed content or files on the website.
- Cookie um
- Duration 1 year
- Description Set by addthis.com.(Purpose not known)
- Cookie DCRP_Categories
- Duration 4 weeks
- Description Description is currently not available.
- Cookie vuid
- Duration 2 years
- Description Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos on the website.
- Cookie X-AB
- Duration 1 day
- Description Adobe Analytics sets this cookie in context with multi-variate testing. This is a tool used to combine or change content on the website. This allows the website to find the best variation or edition of the site.
- Cookie YTC
- Duration 10 minutes
- Description YouTube sets the YTC cookie to manage the embed and viewing of videos on the website.
- Cookie sp_t
- Duration 1 month
- Description The sp_t cookie is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
- Cookie sp_landing
- Duration 1 day
- Description The sp_landing is set by Spotify to implement audio content from Spotify on the website and also registers information on user interaction related to the audio content.
- Cookie __asc
- Duration 30 minutes
- Description Alexa Metrics sets this cookie to track and report information to the Alexa analytics service.
- Cookie __auc
- Duration 1 year
- Description Alexa Metrics sets this cookie to track and report information to the Alexa analytics service.
- Cookie AWSESS
- Duration
- Description Awin sets this to ensure the same kind of advertisement is not shown to the user.
- Cookie nevercache-b39818
- Duration session
- Description Description is currently not available.
REJECT Save My Preferences ACCEPT
Powered by 

[Premium Content](https://www.marktechpost.com/2026/06/06/meet-harness-1-a-20b-retrieval-subagent-trained-with-reinforcement-learning-inside-a-stateful-search-harness-on-gpt-oss-20b# "Premium Content")
[Read our exclusive articles](https://www.marktechpost.com/2026/06/06/meet-harness-1-a-20b-retrieval-subagent-trained-with-reinforcement-learning-inside-a-stateful-search-harness-on-gpt-oss-20b# "Read our exclusive articles")
[Facebook](https://www.marktechpost.com/2026/06/06/meet-harness-1-a-20b-retrieval-subagent-trained-with-reinforcement-learning-inside-a-stateful-search-harness-on-gpt-oss-20b# "Facebook")
[Instagram](https://www.marktechpost.com/2026/06/06/meet-harness-1-a-20b-retrieval-subagent-trained-with-reinforcement-learning-inside-a-stateful-search-harness-on-gpt-oss-20b# "Instagram")
[X](https://www.marktechpost.com/2026/06/06/meet-harness-1-a-20b-retrieval-subagent-trained-with-reinforcement-learning-inside-a-stateful-search-harness-on-gpt-oss-20b# "X")
[Discord](https://pxl.to/ivxz41s "Discord")[Linkedin](https://www.linkedin.com/company/marktechpost/?viewAsMember=true "Linkedin")[Reddit](https://www.reddit.com/r/machinelearningnews/ "Reddit")[X](https://twitter.com/Marktechpost "X")
Search

[](https://www.marktechpost.com/2026/06/06/meet-harness-1-a-20b-retrieval-subagent-trained-with-reinforcement-learning-inside-a-stateful-search-harness-on-gpt-oss-20b#)

Search
Home[Editors Pick](https://www.marktechpost.com/category/editors-pick/ "View all posts in Editors Pick")[Agentic AI](https://www.marktechpost.com/category/editors-pick/agentic-ai/ "View all posts in Agentic AI")Meet Harness-1: A 20B Retrieval Subagent Trained With Reinforcement Learning Inside a...
- Editors Pick
- Agentic AI
- AI Agents
- Artificial Intelligence
- AI Infrastructure
- Tech News
- AI Paper Summary
- Technology
- AI Shorts
- Applications
- New Releases
- Open Source
- Software Engineering
- Staff
Meet Harness-1: A 20B Retrieval Subagent Trained With Reinforcement Learning Inside a Stateful Search Harness on gpt-oss-20b
Harness-1 reaches 0.730 average curated recall across eight benchmarks, trailing only Opus-4.6 among the searchers tested.
By
-
June 6, 2026
Most search agents are trained as policies over a growing transcript. The model decides how to search. It must also remember what it saw, which evidence matters, and which claims it checked. A team of researchers from University of Illinois Urbana-Champaign, UC Berkeley, and Chroma argues this asks too much. Reinforcement learning ends up optimizing both search decisions and routine bookkeeping at once.
Their answer is Harness-1, a 20B retrieval subagent built on gpt-oss-20b. It was trained with reinforcement learning inside a stateful search harness. The harness holds the bookkeeping. The policy keeps the semantic decisions. The weights and harness code are publicly released.

https://arxiv.org/pdf/2606.02373
**What is Harness-1 Actually**
Harness-1 produces a ranked set of documents for a downstream answering model. It does not answer questions itself. It runs inside a state-machine harness centered on a per-episode WORKINGMEMORY.
Each turn works as a loop. The harness renders compact search state along with recent actions. The model emits one structured action. The harness executes it, updates state, and renders the next observation.
**The Stateful Harness: What Moves Out of the Policy**
The research team calls its principle stateful cognitive offloading. The policy decides what to search, curate, and verify, and when to stop. The harness maintains the recoverable state around those decisions.
That state includes several pieces. A candidate pool holds compressed, deduplicated documents. An importance-tagged curated set is the final output, capped at 30 documents. Tags take four values: very_high, high, fair, or low. A full-text store keeps every retrieved chunk outside the prompt.
An evidence graph adds structure. A regex extractor scans each chunk for proper nouns, years, and dates. The harness then renders frequent entities, bridge documents, and singletons. Bridge documents contain two or more frequent entities. Singletons appear in one document and suggest follow-up leads.
The policy works through eight tools. These are fan_out_search, search_corpus, grep_corpus, read_document, review_docs, curate, verify, and end_search. Search outputs are compressed with sentence-BM25, keeping the top four sentences. Two-level deduplication removes repeats by chunk ID and content fingerprint.
One design choice addresses cold starts. The first successful search auto-seeds the curated set with eight reranked results at fair importance. The policy then promotes strong documents and removes weak ones. This turns the task from building from scratch into refinement.
The research team names three requirements for a trainable harness. These are warm-started curation, compact derived-state rendering, and diversity-preserving incentives. Harness-1 implements all three.
**How It is Trained**
Training splits along the same line as the harness. Supervised fine-tuning teaches the model to operate the interface. Reinforcement learning improves search decisions over the maintained state.
A single teacher, GPT-5.4, runs live inside the full harness. After filtering, 899 trajectories remain for SFT. The model uses LoRA at rank 32 for three epochs. The step-550 checkpoint initializes RL.
RL uses on-policy CISPO with a 40-turn cap and terminal-only reward. It trains only on SEC queries. Groups with identical rewards are dropped from the gradient. Training ran on Tinker.
The reward separates discovery from selection. It also adds a tool-diversity bonus. Without that bonus, the agent collapses to repeated search. Curated recall then plateaus near 0.53. With the bonus, diversity stabilizes and recall reaches about 0.60.
**The Benchmark Case**
Harness-1 was evaluated on eight benchmarks spanning web, finance, patents, and multi-hop QA. The main metric is curated recall: coverage of relevant documents in the final set. Trajectory recall counts evidence encountered anywhere in the episode.
| Model | Type | Avg Curated Recall | Avg Trajectory Recall | | --- | --- | --- | --- | | Harness-1 (20B) | Open small | 0.730 | 0.807 | | Tongyi DeepResearch 30B | Open small | 0.616 | 0.673 | | Context-1 (20B) | Open small | 0.603 | 0.756 | | Search-R1 (32B) | Open small | 0.289 | 0.289 | | GPT-OSS-20B | Open small | 0.262 | 0.590 | | Qwen3 (32B) | Open small | 0.216 | 0.446 | | Opus-4.6 | Frontier | 0.764 | 0.794 | | GPT-5.4 | Frontier | 0.709 | 0.752 | | Sonnet-4.6 | Frontier | 0.688 | 0.725 | | Kimi-K2.5 | Frontier | 0.647 | 0.794 | | GPT-OSS-120B | Frontier | 0.496 | 0.769 |
_Averages across eight benchmarks, from Figure 1 of the paper. Frontier models run as zero-shot retrievers under the Context-1 harness._
Harness-1 reaches 0.730 average curated recall. That beats the next open subagent, Tongyi DeepResearch 30B, by 11.4 points. Among the frontier searchers tested, only Opus-4.6 scores higher on average.
The transfer pattern is the clearest signal of the mechanism. SFT used four benchmark families; RL used only SEC. On those source-family tasks, Harness-1 gained 7.9 points over the closest open baseline. On four held-out benchmarks, it gained 17.0 points. That is a 2.2x larger gain on tasks furthest from training data.
Ablations support the harness claim. Disabling all harness mechanisms drops Recall by 12.2 percent relative on BrowseComp+. The trained policy keeps searching but cannot rank what it sees.

https://arxiv.org/pdf/2606.02373
**Use Cases**
The method targets evidence-seeking retrieval where documents support an answer. Several workflows fit this shape.
One is literature and patent review. The evidence graph and curated set help organize many sources. Another is financial-filing analysis. The SEC case study recovers an exact executive-transition date across multiple 8-Ks.
A third is multi-hop fact-checking. The fan_out_search and verify tools resolve ambiguous entities before committing. A fourth is modular RAG. The curated set feeds a frozen generator, and better sets yield higher answer accuracy.
**Strengths and Weaknesses**
#### Strengths
- Highest average curated recall among the open models tested, and behind only Opus-4.6 overall.
- Gains hold on held-out benchmarks, suggesting domain-general search operations.
- Trained on 4,352 unique items, far fewer than several baselines.
- Open checkpoint and harness code, servable with common runtimes.
#### Weaknesses
- The evidence graph uses regex extraction, not full entity linking.
- The verify tool is an LLM proxy that can err on ambiguous claims.
- Sentence-BM25 compression may drop context tied to discourse structure.
- The research team reports point estimates without full confidence intervals.
**Key Takeaways**
- Harness-1 is a 20B search agent that moves search bookkeeping into the environment, leaving semantic decisions to the policy.
- It hits 0.730 average curated recall across eight benchmarks, beating the next open subagent by 11.4 points.
- Among the searchers tested, only Opus-4.6 scores higher on average curated recall.
- Gains are largest on held-out benchmarks (+17.0 vs +7.9 points), suggesting the learned search operations transfer.
- Weights and harness code are public, servable via vLLM, SGLang, or Transformers.
**Marktechpost’s Visual Explainer**
Stateful Search Agents 1 / 7
Research Guide
Harness-1: a 20B search agent with a stateful harness
A retrieval subagent trained with reinforcement learning inside a search harness that holds the bookkeeping.
20B · gpt-oss-20b base UIUC · UC Berkeley · Chroma arXiv:2606.02373 Open weights & code
The Core Idea
Split the work between policy and harness
Most search agents pack search decisions and routine bookkeeping into one growing transcript. Harness-1 separates the two. The paper calls this stateful cognitive offloading.
Policy decides
- What to search
- Which documents to keep
- What claims to verify
- When to stop
Harness maintains
- Candidate pool
- Curated evidence
- Verification records
- Context budget
Inside the Harness
Environment-side working memory
- Candidate pool— compressed, deduplicated documents
- Curated set— importance-tagged, capped at 30 (very_high / high / fair / low)
- Evidence graph— entities, bridges, and singletons via regex extraction
- Verification cache— claim to document to yes/no verdict
- Full-text store— every retrieved chunk kept outside the prompt
- Compression— sentence-BM25 keeps the top four sentences
Policy Actions
Eight tools edit the state
fan_out_search
search_corpus
grep_corpus
read_document
review_docs
curate
verify
end_search
The first successful search auto-seeds the curated set with eight reranked documents at fair importance. The policy then promotes strong documents and removes weak ones.
Training
SFT to operate the interface, RL to search
SFT: GPT-5.4 teacher inside the harness · 899 trajectories · LoRA rank 32 · step-550 checkpoint
RL: on-policy CISPO · SEC queries only · 40-turn cap · terminal reward · trained on Tinker
Data scale: 4,352 unique training items (899 SFT + 3,453 RL)
Three trainability requirements: warm-started curation, compact derived-state rendering, and diversity-preserving incentives.
Results
What the numbers show
0.730 average curated recall
across eight benchmarks
+11.4 pts over the next open subagent, Tongyi DeepResearch 30B
Among the searchers tested, only Opus-4.6 scores higher on average
Transfer: +17.0 on held-out vs +7.9 on source-family (2.2x gap)
Ablation: removing all harness mechanisms drops Recall 12.2% relative
Get Started
Run it yourself
Serve: vLLM, SGLang, or Transformers
Checkpoint: pat-jj/harness-1 (Hugging Face, 21B params, BF16)
Code: github.com/pat-jj/harness-1
Paper: arXiv:2606.02373
Harness-1 returns a curated set of documents for a downstream answering model. It does not answer questions itself.
← Prev Next →
Curated by Marktechpost — practitioner-first AI/ML research, news, and dev tooling for engineers.
- * *
Check out the[Paper](https://arxiv.org/pdf/2606.02373),**Model weights**and[GitHub Repo](https://github.com/pat-jj/harness-1).Also,feel free to follow us on[Twitter](https://x.com/intent/follow?screen_name=marktechpost)and don’t forget to join our[150k+ ML SubReddit](https://www.reddit.com/r/machinelearningnews/)and Subscribe to[our Newsletter](https://www.aidevsignals.com/). Wait! are you on telegram?[now you can join us on telegram as well.](https://t.me/machinelearningresearchnews)
Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.?[Connect with us](https://forms.gle/wbash1wF6efRj8G58)

Previous articleNVIDIA garak Tutorial: Build a Complete Defensive LLM Red-Teaming Workflow with Custom Probes and Detectors
#### RELATED ARTICLESMORE FROM AUTHOR

[NVIDIA garak Tutorial: Build a Complete Defensive LLM Red-Teaming Workflow with Custom Probes and Detectors](https://www.marktechpost.com/2026/06/06/nvidia-garak-tutorial-build-a-complete-defensive-llm-red-teaming-workflow-with-custom-probes-and-detectors/ "NVIDIA garak Tutorial: Build a Complete Defensive LLM Red-Teaming Workflow with Custom Probes and Detectors")

[Google’s New Colab CLI Lets Developers and AI Agents Run Python on Remote Colab GPUs and TPUs From the Terminal](https://www.marktechpost.com/2026/06/06/googles-new-colab-cli-lets-developers-and-ai-agents-run-python-on-remote-colab-gpus-and-tpus-from-the-terminal/ "Google’s New Colab CLI Lets Developers and AI Agents Run Python on Remote Colab GPUs and TPUs From the Terminal")

[Moonshot AI Releases Kimi Code CLI: A Terminal AI Coding Agent Built in TypeScript for Next-Gen Agents](https://www.marktechpost.com/2026/06/06/moonshot-ai-releases-kimi-code-cli-a-terminal-ai-coding-agent-built-in-typescript-for-next-gen-agents/ "Moonshot AI Releases Kimi Code CLI: A Terminal AI Coding Agent Built in TypeScript for Next-Gen Agents")

[NVIDIA Releases Nemotron 3.5 ASR: A 600M-Parameter Cache-Aware Streaming Model Transcribing 40 Language-Locales in Real Time](https://www.marktechpost.com/2026/06/06/nvidia-releases-nemotron-3-5-asr-a-600m-parameter-cache-aware-streaming-model-transcribing-40-language-locales-in-real-time/ "NVIDIA Releases Nemotron 3.5 ASR: A 600M-Parameter Cache-Aware Streaming Model Transcribing 40 Language-Locales in Real Time")

[A Hands-On Coding Tutorial on Qualcomm AI Hub Models for Classification, Object Detection, and Hardware-Aware Deployment](https://www.marktechpost.com/2026/06/05/a-hands-on-coding-tutorial-on-qualcomm-ai-hub-models-for-classification-object-detection-and-hardware-aware-deployment/ "A Hands-On Coding Tutorial on Qualcomm AI Hub Models for Classification, Object Detection, and Hardware-Aware Deployment")

[Google DeepMind Releases Gemma 4 QAT Checkpoints: Q4_0 and a New Mobile Format Cut On-Device Memory](https://www.marktechpost.com/2026/06/05/google-deepmind-releases-gemma-4-qat-checkpoints-q4_0-and-a-new-mobile-format-cut-on-device-memory/ "Google DeepMind Releases Gemma 4 QAT Checkpoints: Q4_0 and a New Mobile Format Cut On-Device Memory")
[](https://www.marktechpost.com/2026/06/06/meet-harness-1-a-20b-retrieval-subagent-trained-with-reinforcement-learning-inside-a-stateful-search-harness-on-gpt-oss-20b#)[](https://www.marktechpost.com/2026/06/06/meet-harness-1-a-20b-retrieval-subagent-trained-with-reinforcement-learning-inside-a-stateful-search-harness-on-gpt-oss-20b#)


[NVIDIA garak Tutorial: Build a Complete Defensive LLM Red-Teaming Workflow with Custom Probes and...](https://www.marktechpost.com/2026/06/06/nvidia-garak-tutorial-build-a-complete-defensive-llm-red-teaming-workflow-with-custom-probes-and-detectors/ "NVIDIA garak Tutorial: Build a Complete Defensive LLM Red-Teaming Workflow with Custom Probes and Detectors")
Sana Hassan-June 6, 20260
This tutorial walks through NVIDIA garak as an end-to-end framework for defensive LLM red-teaming. It covers setup, plugin discovery, dry runs, real-model scans on a Hugging Face generator, and multi-probe evaluations. The workflow then analyzes safety scores and attack success rates, inspects flagged outputs, and extends garak with a custom probe and detector. It closes by exporting results in AVID format for structured vulnerability

[Google’s New Colab CLI Lets Developers and AI Agents Run Python on Remote Colab...](https://www.marktechpost.com/2026/06/06/googles-new-colab-cli-lets-developers-and-ai-agents-run-python-on-remote-colab-gpus-and-tpus-from-the-terminal/ "Google’s New Colab CLI Lets Developers and AI Agents Run Python on Remote Colab GPUs and TPUs From the Terminal")
Asif Razzaq-June 6, 20260
Google released the Colab CLI, letting developers and AI agents run local code on remote Colab GPU and TPU runtime

[Moonshot AI Releases Kimi Code CLI: A Terminal AI Coding Agent Built in TypeScript...](https://www.marktechpost.com/2026/06/06/moonshot-ai-releases-kimi-code-cli-a-terminal-ai-coding-agent-built-in-typescript-for-next-gen-agents/ "Moonshot AI Releases Kimi Code CLI: A Terminal AI Coding Agent Built in TypeScript for Next-Gen Agents")
Michal Sutter-June 6, 20260
Kimi Code CLI is Moonshot AI's open-source terminal coding agent, written in TypeScript with subagents and MCP configuration.

[NVIDIA Releases Nemotron 3.5 ASR: A 600M-Parameter Cache-Aware Streaming Model Transcribing 40 Language-Locales in...](https://www.marktechpost.com/2026/06/06/nvidia-releases-nemotron-3-5-asr-a-600m-parameter-cache-aware-streaming-model-transcribing-40-language-locales-in-real-time/ "NVIDIA Releases Nemotron 3.5 ASR: A 600M-Parameter Cache-Aware Streaming Model Transcribing 40 Language-Locales in Real Time")
Asif Razzaq-June 6, 20260
NVIDIA released Nemotron 3.5 ASR, a cache-aware 600M streaming model transcribing 40 language-locales in real time from one checkpoint.

[A Hands-On Coding Tutorial on Qualcomm AI Hub Models for Classification, Object Detection, and...](https://www.marktechpost.com/2026/06/05/a-hands-on-coding-tutorial-on-qualcomm-ai-hub-models-for-classification-object-detection-and-hardware-aware-deployment/ "A Hands-On Coding Tutorial on Qualcomm AI Hub Models for Classification, Object Detection, and Hardware-Aware Deployment")
Sana Hassan-June 5, 20260
Set up Qualcomm AI Hub Models to run MobileNet-V2 inference, YOLOv7 detection, and compile models on real devices.

[Google DeepMind Releases Gemma 4 QAT Checkpoints: Q4_0 and a New Mobile Format Cut...](https://www.marktechpost.com/2026/06/05/google-deepmind-releases-gemma-4-qat-checkpoints-q4_0-and-a-new-mobile-format-cut-on-device-memory/ "Google DeepMind Releases Gemma 4 QAT Checkpoints: Q4_0 and a New Mobile Format Cut On-Device Memory")
Asif Razzaq-June 5, 20260
Compare Gemma 4 edge formats: BF16, Q4_0 QAT, and mobile QAT, on published memory numbers and design tradeoffs.

[NVIDIA AI Releases Dynamo Snapshot: A CRIU-Based Fast Startup System for AI Inference on...](https://www.marktechpost.com/2026/06/05/nvidia-ai-releases-dynamo-snapshot-a-criu-based-fast-startup-system-for-ai-inference-on-kubernetes/ "NVIDIA AI Releases Dynamo Snapshot: A CRIU-Based Fast Startup System for AI Inference on Kubernetes")
Asif Razzaq-June 5, 20260
NVIDIA Dynamo Snapshot checkpoints and restores vLLM inference workers on Kubernetes using CRIU and cuda-checkpoint tools.

[Perplexity AI Introduces Hybrid Local-Server Inference Orchestrator for Personal Computer: Automatic On-Device and Cloud...](https://www.marktechpost.com/2026/06/05/perplexity-ai-introduces-hybrid-local-server-inference-orchestrator-for-personal-computer-automatic-on-device-and-cloud-task-routing/ "Perplexity AI Introduces Hybrid Local-Server Inference Orchestrator for Personal Computer: Automatic On-Device and Cloud Task Routing")
Michal Sutter-June 5, 20260
Perplexity AI announces a hybrid local-server inference orchestrator for Personal Computer, automatically routing AI tasks between on-device and cloud models.

[Microsoft Fara Tutorial: Run a Browser-Use Agent in Google Colab with a Mock OpenAI-Compatible...](https://www.marktechpost.com/2026/06/05/microsoft-fara-tutorial-run-a-browser-use-agent-in-google-colab-with-a-mock-openai-compatible-endpoint/ "Microsoft Fara Tutorial: Run a Browser-Use Agent in Google Colab with a Mock OpenAI-Compatible Endpoint")
Sana Hassan-June 5, 20260
A hands-on guide to running Microsoft Fara in Colab, testing the browser agent loop with a mock endpoint.

[15 Best Vibe Coding Tools in 2026 Compared: Pricing, Features, and Best Fit](https://www.marktechpost.com/2026/06/05/15-best-vibe-coding-tools-in-2026-compared-pricing-features-and-best-fit/ "15 Best Vibe Coding Tools in 2026 Compared: Pricing, Features, and Best Fit")
Asif Razzaq-June 5, 20260
Vibe coding turns plain language into working software. Explore 15 tools shaping how developers build apps in 2026.


[Discord](https://pxl.to/ivxz41s "Discord")[Linkedin](https://www.linkedin.com/company/marktechpost/?viewAsMember=true "Linkedin")[Reddit](https://www.reddit.com/r/machinelearningnews/ "Reddit")[X](https://twitter.com/Marktechpost "X")
© Copyright Reserved @2025 Marktechpost AI Media Inc
[](https://www.marktechpost.com/2026/06/06/meet-harness-1-a-20b-retrieval-subagent-trained-with-reinforcement-learning-inside-a-stateful-search-harness-on-gpt-oss-20b#)[](https://www.marktechpost.com/2026/06/06/meet-harness-1-a-20b-retrieval-subagent-trained-with-reinforcement-learning-inside-a-stateful-search-harness-on-gpt-oss-20b#)
Loading Comments...
Write a Comment...
Email (Required) Name (Required) Website
[](https://www.marktechpost.com/2026/06/06/meet-harness-1-a-20b-retrieval-subagent-trained-with-reinforcement-learning-inside-a-stateful-search-harness-on-gpt-oss-20b#)
