Sarang Kulkarni on Lessons from Building Deep Research Agents in Production

TL;DR · AI 摘要
Sarang Kulkarni分享了构建生产环境中深度研究代理的经验教训,强调了数据管理和模型集成的重要性。
核心要点
- 数据管理和清洗是构建深度研究代理的基础。
- 模型集成需要考虑不同模型的协同工作。
- 监控和反馈机制对于持续改进至关重要。
结构提纲
按章节快速跳转。
思维导图
用一张图看清主题之间的关系。
查看大纲文本(无障碍 / 无 JS 友好)
- 构建深度研究代理
金句 / Highlights
值得收藏与分享的关键句。
数据的质量直接影响研究代理的性能和可靠性。
模型集成需要解决不同模型之间的接口兼容性和性能优化问题。
持续的监控和用户反馈是确保系统长期稳定运行的关键。
Sarang Kulkarni on Lessons from Building Deep Research Agents in Production - InfoQ
[BT](https://www.infoq.com/int/bt/ "bt")
InfoQ Software Architects' Newsletter
A monthly overview of things you need to know as an architect or aspiring architect.
Enter your e-mail address
Select your country - [x] I consent to InfoQ.com handling my data as explained in this Privacy Notice.
Close
Live Webinar and Q&A: Architecting for Autonomous Reliability: Embedding AI into Your Observability Stack (Jun 25, 2026)Save Your Seat
Close
Toggle Navigation
Facilitating the Spread of Knowledge and Innovation in Professional Software Development
English edition
[Write for InfoQ](https://www.infoq.com/write-for-infoq/ "Write for InfoQ")
Search
Unlock the full InfoQ experience
Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with content, and download exclusive resources.
or
Don't have an InfoQ account?
- Stay updated on topics and peers that matter to youReceive instant alerts on the latest insights and trends.
- Quickly access free resources for continuous learningMinibooks, videos with transcripts, and training materials.
- Save articles and read at anytimeBookmark articles to read whenever youre ready.
NewsArticlesPresentationsPodcastsGuides
Topics
[Development](https://www.infoq.com/development/ "Development")
- [Java](https://www.infoq.com/java/ "Java")
- [Kotlin](https://www.infoq.com/kotlin/ "Kotlin")
- [.Net](https://www.infoq.com/dotnet/ ".Net")
- [C#](https://www.infoq.com/c_sharp/ "C#")
- [Swift](https://www.infoq.com/swift/ "Swift")
- [Go](https://www.infoq.com/golang/ "Go")
- [Rust](https://www.infoq.com/rust/ "Rust")
- [JavaScript](https://www.infoq.com/javascript/ "JavaScript")
Featured in Development
Dany Lepage discusses the architectural journey of porting a hit VR title to seven non-VR platforms. He explains how his team solved the challenges of cross-progression, diverse input paradigms, and maintaining release velocity across Steam, iOS, and PlayStation. Beyond the tech, he shares candid lessons on the "product fit" gap when translating immersive social presence to 2D screens.

All in developmentFollow Topic
[Architecture & Design](https://www.infoq.com/architecture-design/ "Architecture & Design")
- [Architecture](https://www.infoq.com/architecture/ "Architecture")
- [Enterprise Architecture](https://www.infoq.com/enterprise-architecture/ "Enterprise Architecture")
- [Scalability/Performance](https://www.infoq.com/performance-scalability/ "Scalability/Performance")
- [Design](https://www.infoq.com/design/ "Design")
- [Case Studies](https://www.infoq.com/Case_Study/ "Case Studies")
- [Microservices](https://www.infoq.com/microservices/ "Microservices")
- [Service Mesh](https://www.infoq.com/servicemesh/ "Service Mesh")
- [Patterns](https://www.infoq.com/DesignPattern/ "Patterns")
- [Security](https://www.infoq.com/Security/ "Security")
Featured in Architecture & Design
- #### Context is the Key to the Agentic Architecture Revolution: a Conversation with Baruch Sadogursky
Michael Stiefel spoke to Baruch Sadogursky about software architecture in the age of agentic AI. LLM can function, albeit stochastically, as reasoning machines capable of interpreting human ambiguity. With the appropriate rigorous context artifacts to control the LLM’s reasoning, software specifications can become the source of truth, while the code becomes a disposable intermediate language.

All in architecture-designFollow Topic
[AI Infrastructure](https://www.infoq.com/ai-ml-data-eng/ "AI Infrastructure")
- [Big Data](https://www.infoq.com/bigdata/ "Big Data")
- [Machine Learning](https://www.infoq.com/machinelearning/ "Machine Learning")
- [NoSQL](https://www.infoq.com/nosql/ "NoSQL")
- [Database](https://www.infoq.com/database/ "Database")
- [Data Analytics](https://www.infoq.com/data-analytics/ "Data Analytics")
- [Streaming](https://www.infoq.com/streaming/ "Streaming")
Featured in AI, ML & Data Engineering
Aaron Erickson discusses the evolution of AI workflows, shifting from "vibe checking" to building reliable, multi-agent frameworks. He explains how to combine deterministic software guardrails with agentic discovery, optimize agent hierarchies, leverage time-series foundation models, and implement rigorous evaluation pyramids to ensure architecture scales effectively in production.

All in ai-ml-data-engFollow Topic
[Culture & Methods](https://www.infoq.com/culture-methods/ "Culture & Methods")
- [Agile](https://www.infoq.com/agile/ "Agile")
- [Diversity](https://www.infoq.com/diversity/ "Diversity")
- [Leadership](https://www.infoq.com/leadership/ "Leadership")
- [Lean/Kanban](https://www.infoq.com/lean/ "Lean/Kanban")
- [Personal Growth](https://www.infoq.com/personal-growth/ "Personal Growth")
- [Scrum](https://www.infoq.com/scrum/ "Scrum")
- [Sociocracy](https://www.infoq.com/sociocracy/ "Sociocracy")
- [Software Craftmanship](https://www.infoq.com/software_craftsmanship/ "Software Craftmanship")
- [Team Collaboration](https://www.infoq.com/team-collaboration/ "Team Collaboration")
- [Testing](https://www.infoq.com/testing/ "Testing")
- [UX](https://www.infoq.com/ux/ "UX")
Featured in Culture & Methods
Sergiu Petean discusses the strategic journey of evolving DevOps into platform engineering within heavily regulated enterprise environments. He explains how to maximize efficiency using dynamic reference architectures, align platform KPIs directly with board-level business goals, reduce cognitive load via custom team topologies, and maintain innovation sovereignty through open-source technology.

All in culture-methodsFollow Topic
- [Infrastructure](https://www.infoq.com/infrastructure/ "Infrastructure")
- [Continuous Delivery](https://www.infoq.com/continuous_delivery/ "Continuous Delivery")
- [Automation](https://www.infoq.com/automation/ "Automation")
- [Containers](https://www.infoq.com/containers/ "Containers")
- [Cloud](https://www.infoq.com/cloud-computing/ "Cloud")
- [Observability](https://www.infoq.com/observability/ "Observability")
Featured in DevOps
Joseph Stein discusses engineering an enterprise AI-as-a-Service platform within a private cloud data center. He explains how to maximize underutilized GPU pools via multi-namespace scheduling, leverage Valkey and Lua for atomic priority queuing and backpressure management, mitigate OWASP Top 10 LLM risks via central proxy gateways, and scale batch pipelines using a custom S3-to-Kafka proxy.

All in devopsFollow Topic
[Events](https://events.infoq.com/ "Events")
Helpful links
- [About InfoQ](https://www.infoq.com/about-infoq "About InfoQ")
- [InfoQ Editors](https://www.infoq.com/infoq-editors "InfoQ Editors")
- [Write for InfoQ](https://www.infoq.com/write-for-infoq "Write for InfoQ")
- [About C4Media](https://c4media.com/ "About C4Media")
- [Diversity](https://c4media.com/diversity "Diversity")
Choose your language

[InfoQ Homepage](https://www.infoq.com/ "InfoQ Homepage")[News](https://www.infoq.com/news "News")Sarang Kulkarni on Lessons from Building Deep Research Agents in Production
[AI, ML & Data Engineering](https://www.infoq.com/ai-ml-data-eng/ "AI, ML & Data Engineering")
Shipping Faster, Breaking More: Rethinking Delivery Systems in the Age of AI (Webinar May 28th)
Sarang Kulkarni on Lessons from Building Deep Research Agents in Production
May 27, 2026 3 min read
by
- Srini Penchikala
Follow Senior Software Architect
#### Write for InfoQ
Feed your curiosity.Help 550k+ global
senior developers
each month stay ahead.Get in touch
Log in to listen to this article
Audio ready to play
0:00 0:00
Normal 1.25x 1.5x
Like
Deep Research Agentic Systems,such as OpenAIand Gemini Deep Research Agent,are AI Agents designed to conduct multi-step research on the internet for complex tasks using dynamic reasoning, multi-hop information retrieval, and generate comprehensive, structured analytical reports at the level of a research analyst.
Sarang Kulkarni from Thoughtworks team spoke at the Arc of AI Conference 2026on how to design and deploy multi-agent research systems for deep reasoning and synthesis, and the lessons learned from real-world healthcare and pharmaceutical R&D projects developing Deep Research Agents. He also discussed how the team leveraged techniques like agentic loopsand harness engineering to get the best out of the solution.
In critical industries like healthcare and clinical trials, the researchers need more than the traditional AI models that perform simple Q&A tasks. They need systems that can discover, connect, and reason across both internal and Internet data, while maintaining reliability, transparency, and compliance.
Kulkarni started the presentation by highlighting that it typically costs $2.6B to bring a new drug to market. Also, about half the research studies are conducted without prior evidence because the knowledge exists, but access to this knowledge and information is broken. In the overall drug discovery and development pipeline, getting the right data at the right time is a major challenge. With the goal of inventing a new drug using AI technologies, their team built a Retrieval Augmented Generation (RAG) based chatbot two years ago to search through the unstructured data. For simple queries in the study, the RAG solution worked fine, but for complex questions, they had to enhance it to be an agentic RAG [] application. And for deep research use cases, the team developed a solution they call the Agentic RAG++.
Kulkarni shared the details of the deep research system, which consists of a clarification loop, research loop (to perform the tasks think and plan, execute, reflect, adjust the plan), and the writing loop that focuses on the write and reflect tasks. The researcher agent initial version was based on two tools: RAG tool and text2sql tool. RAG tool’s design is based on weighted hybrid search, 20 context chunks, a re-ranker, and seven refined context chunks. The text2sql tool is responsible for feeding SQL query errors back to the LLM to improve the model for better accuracy of query execution. He mentioned factors like higher token cost, poor performance, and high latency can result in poor retrieval from AI agents. Context anxietyis another problem that teams need to be cautious about. Also incomplete data can lead to poor self-evaluation, but techniques like the reflection loop can help with data completeness.
The speaker discussed the different failure modes they had to address when developing the custom deep research agent solution. Long-horizon tasks require an explicit think-act loop. This can be solved by incorporating multiple steps like think, plan (that works before research), inspect (works after the research is complete and validates the output), and finally the update step, which actually creates the final report. Anthropic's "think" tool and other similar solutions can help formazlie the reasoning pause.
Also the long-horizon tasks tend to break decisions between steps in the overall process. The reflection step in their solution includes not only the data relfection, but also a process reflection that assesses if the process is complete or not. This phase includes a third reflection step called Draft Writing Loop that helps with synthesis gaps, for example any information that was in the research but write task didn't capture it, so the re-draft step takes care of it.
Kulkarni concluded the talk with a discussion on the emerging harness engineeringtechniques, where designing the tools, memory systems, and validation checks, constraints, and feedback loops make autonomous AI agents more reliable and accountable. Harness engineering’sgoal is to help the AI solutions shift from just prompt engineering to focus on the automated execution of tasks by AI agents. Since AI Agents are basically the combination of model and harness,the better the models are, the thinner harness needs to be.
About the Author

#### Srini Penchikala
Srini Penchikala currently works as Senior Software Architect in Austin, Texas. He is also the Lead Editor for AI/ML/Data Engineering community at InfoQ (http://www.infoq.com/author/Srini-Penchikala). Srini has over 22 years of experience in software architecture, design and development. He is the author of "Big Data Processing with Apache Spark. He is also the co-author of "Spring Roo in Action" book (http://www.manning.com/SpringRooinAction) from Manning Publications. Srini has presented at conferences like Big Data Conference, Enterprise Data World, JavaOne, SEI Architecture Technology Conference (SATURN), IT Architect Conference (ITARC), No Fluff Just Stuff, NoSQL Now and Project World Conference. He also published several articles on software architecture, security and risk management, and NoSQL databases on websites like InfoQ, The ServerSide, OReilly Network (ONJava), DevX Java, java.net and JavaWorld.
Show more Show less
#### This content is in the AI, ML & Data Engineering topic
Follow Topic
##### Related Topics:
Followers: 5925
Follow Topic
Followers: 141
Follow Topic
Followers: 218
Follow Topic
Followers: 51
Follow Topic
Followers: 51
Follow Topic
* #### Related Editorial
* #### Related Sponsors
* #### Related Sponsor

- June 25, 2026, 1 PM EDT
##### Architecting for Autonomous Reliability: Embedding AI into Your Observability Stack
Presented by: Justin Griffin - Head of Product at NeuBird AI
SPONSORED BY NEUBIRD AI Save your seat
Related Content
May 18, 2026
May 15, 2026
May 13, 2026
May 11, 2026
May 08, 2026
May 06, 2026
May 05, 2026
May 01, 2026
Apr 30, 2026
Related Sponsors
- #### Before it Breaks: AI-Driven Azure Incident Response
In this joint Microsoft and NeuBird AIi webinar, you’ll see how agentic AI is transforming Azure incident management from reactive firefighting to proactive, AI-driven resolution. Register Now.
- #### Autonomous Production Operations Built on AWS
Autonomous Production Operations Built on AWS - Learn how agentic AI is transforming cloud operations by turning telemetry into real-time investigation, enabling faster decisions and more autonomous AWS operations.Download Now.
- Sponsored by

Related Content
Apr 30, 2026
Apr 28, 2026
May 01, 2026 
Apr 02, 2026 
Mar 16, 2026 
Feb 19, 2026 
**The InfoQ** Newsletter
A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example
Enter your e-mail address
Select your country - [x] I consent to InfoQ.com handling my data as explained in this Privacy Notice.
- ##### [Pip 26.1 Ships Dependency Cooldowns and Experimental Lockfile Support to Combat Supply Chain Attacks](https://www.infoq.com/news/2026/05/pip-261-dependency-cooldowns/ "Pip 26.1 Ships Dependency Cooldowns and Experimental Lockfile Support to Combat Supply Chain Attacks")
- ##### [Cloudflare and Stripe Let AI Agents Create Accounts, Buy Domains, and Deploy to Production](https://www.infoq.com/news/2026/05/cloudflare-stripe-agent-commerce/ "Cloudflare and Stripe Let AI Agents Create Accounts, Buy Domains, and Deploy to Production")
- ##### [Google Introduces Cloud Fraud Defense as Successor to reCAPTCHA](https://www.infoq.com/news/2026/05/cloud-fraud-defense-recaptcha/ "Google Introduces Cloud Fraud Defense as Successor to reCAPTCHA")
- ##### [How LinkedIn Identified a Kernel Lock Contention Issue Causing Recurring System Freezes](https://www.infoq.com/news/2026/05/linkedin-kernel-lock-freeze/ "How LinkedIn Identified a Kernel Lock Contention Issue Causing Recurring System Freezes")
- ##### [Uber Improves Restaurant Recommendations Using Real-Time Signals and Listwise Ranking](https://www.infoq.com/news/2026/05/uber-eats-ranking-system/ "Uber Improves Restaurant Recommendations Using Real-Time Signals and Listwise Ranking")
- ##### [Designing a Multi-Agent System for Engineering Support at Scale: a Case Study from Grab](https://www.infoq.com/news/2026/05/grab-multi-agent-support-system/ "Designing a Multi-Agent System for Engineering Support at Scale: a Case Study from Grab")
- ##### [From Legacy to Sovereignty: Driving the Future of Insurance through Platform Engineering](https://www.infoq.com/presentations/insurance-platform-engineering/ "From Legacy to Sovereignty: Driving the Future of Insurance through Platform Engineering")
- ##### [How Platform Engineering Using Golden Bricks Can Enable Fast and Smooth Delivery](https://www.infoq.com/news/2026/05/platform-golden-bricks/ "How Platform Engineering Using Golden Bricks Can Enable Fast and Smooth Delivery")
- ##### [Product Thinking for Cloud Native Engineers](https://www.infoq.com/presentations/product-cloud-native/ "Product Thinking for Cloud Native Engineers")
- ##### [Designing AI Platforms for Reliability: Tools for Certainty, Agents for Discovery](https://www.infoq.com/presentations/ai-platforms-reliability/ "Designing AI Platforms for Reliability: Tools for Certainty, Agents for Discovery")
- ##### [Sarang Kulkarni on Lessons from Building Deep Research Agents in Production](https://www.infoq.com/news/2026/05/kulkarni-deep-research-agents/ "Sarang Kulkarni on Lessons from Building Deep Research Agents in Production")
- ##### [InfoQ Online Certification Program: New AI Engineering and Organizational Architecture Cohorts](https://www.infoq.com/news/2026/05/online-cohort-certification-prog/ "InfoQ Online Certification Program: New AI Engineering and Organizational Architecture Cohorts")
- ##### [Platform Engineering Labs Expands formae with Kubernetes Support, Native Helm Integration](https://www.infoq.com/news/2026/05/formae-k8s-helm-integration/ "Platform Engineering Labs Expands formae with Kubernetes Support, Native Helm Integration")
- ##### [Realtime and Batch Processing of GPU Workloads](https://www.infoq.com/presentations/realtime-gpu-workloads/ "Realtime and Batch Processing of GPU Workloads")
- ##### [Discord Rebuilds Database Operations around Automation to Manage ScyllaDB at Massive Scale](https://www.infoq.com/news/2026/05/discord-scylladb-automation/ "Discord Rebuilds Database Operations around Automation to Manage ScyllaDB at Massive Scale")
**The InfoQ** Newsletter
A round-up of last week’s content on InfoQ sent out every Tuesday. Join a community of over 250,000 senior developers. View an example
- Get a quick overview of content published on a variety of innovator and early adopter technologies
- Learn what you don’t know that you don’t know
- Stay up to date with the latest information from the topics you are interested in
Enter your e-mail address
Select your country - [x] I consent to InfoQ.com handling my data as explained in this Privacy Notice.
#### Events
- ##### QCon AI Boston
June 1-2, 2026
June 10, 2026
July 25, 2026
- ##### QCon San Francisco
November 16-20, 2026
#### Follow us on
Youtube 232K FollowersLinkedin 26K FollowersInstagram NewRSS 19K ReadersX 57.1k FollowersFacebook 21K LikesBluesky New
#### Stay in the know
The InfoQ PodcastEngineering Culture PodcastThe Software Architects' Newsletter
General Feedback [feedback@infoq.com](mailto:feedback@infoq.com) Advertising [sales@infoq.com](mailto:sales@infoq.com) Editorial [editors@infoq.com](mailto:editors@infoq.com) Marketing [marketing@infoq.com](mailto:marketing@infoq.com)
InfoQ.com and all content copyright © 2006-2026 C4Media Inc.
Privacy Notice, Terms And Conditions, Cookie Policy
Close
[BT](https://www.infoq.com/int/bt/ "bt")