Clinical operations intelligence belongs on the Lakehouse
TL;DR · AI Summary
Clinical operations intelligence should be based on the Lakehouse architecture to improve data processing efficiency and analytical capabilities.
Key Takeaways
- The Lakehouse architecture can integrate and optimize medical data processing.
- Clinical operations intelligence requires real-time data processing and large-sc
- Databricks provides powerful tools to support clinical operations intelligence i
Outline
Jump quickly between sections.
Introduce the importance of clinical operations intelligence and its challenges.
The Lakehouse architecture can integrate multiple data sources and provide a unified data management platform.
Emphasize the need for real-time data processing and large-scale data analysis.
Databricks offers various tools and services to support clinical operations intelligence in the Lakehouse architecture.
Present actual case studies demonstrating the application effects of the Lakehouse architecture in clinical operations intelligence.
Mindmap
See how the topics connect at a glance.
查看大纲文本(无障碍 / 无 JS 友好)
- Clinical operations intelligence on the Lakehouse
- Lakehouse 架构优势
- 整合多种数据源
- 提供统一的数据管理平台
- 临床运营智能需求
- 实时数据处理
- 大规模数据分析
Highlights
Key sentences worth saving and sharing.
The Lakehouse architecture can integrate and optimize medical data processing, improving analytical efficiency.
Clinical operations intelligence requires real-time data processing and large-scale data analysis capabilities.
Databricks provides powerful tools to support clinical operations intelligence in the Lakehouse architecture.
Clinical operations intelligence belongs on the Lakehouse | Databricks Blog
[](https://www.databricks.com/)
[](https://www.databricks.com/)
- Why Databricks
- * Discover
- Customers
- Partners
- Product
- * Databricks Platform
- Integrations and Data
- Pricing
- Open Source
- Solutions
- * Databricks for Industries
- Cross Industry Solutions
- Migration & Deployment
- Solution Accelerators
- Resources
- * Learning
- Events
- Blog and Podcasts
- Get Help
- Dive Deep
- About
- * Company
- Careers
- Press
- Security and Trust
- DATA + AI SUMMIT 
Table of contents
Table of contents
Table of contents
IndustriesMay 13, 2026
Clinical operations intelligence belongs on the Lakehouse
How Databricks Apps, Lakebase, and AI/BI Genie eliminate the integration stack between clinical data and decision-support applications — and why that architecture change is what clinical operations have been missing.
Summary
- What it is: The Site Feasibility Workbench is an open-source Databricks App that runs clinical trial site selection entirely within the Databricks workspace — combining ML-driven site scoring, Lakebase for operational state, and AI/BI Genie for natural language data access, with no external API calls or synchronization pipelines.
- The challenge it solves: 37% of investigator sites miss enrollment targets, and the root cause is architectural — clinical operations data and the applications that use it live in disconnected systems, forcing decisions into spreadsheets and creating integration overhead, credential sprawl, and synchronization lag that erodes trust in the data.
- Results and outcomes: TA-segmented LightGBM models trained on your own CTMS, EDC, and IRT history — not industry averages — produce scores that improve as your portfolio grows, with SHAP-driven explanations stored as governed, versioned Delta tables. Every prediction carries SHAP-driven attribution stored as a governed Delta table, making model rationale as auditable and versioned as the score itself.
The clinical data problem is not a storage problem. Most organizations already have a data warehouse, a CTMS, an EDC, and somewhere downstream, a BI layer. The problem is that none of these systems talk to each other in a way that supports the actual decisions clinical teams need to make — and so the decisions get made in spreadsheets instead.
Today we are releasing the Site Feasibility Workbench as a fully open-source Databricks App — to show what clinical operations intelligence looks like when the application, the models, and the data live on the same platform. The Tufts Center for the Study of Drug Development has documented that 37% of activated investigator sites enrolled fewer patients than their targets, and an additional 11% enrolled no patients at all — the combined effect being that 53% of trials exceeded their planned enrollment timelines, with one in six taking more than twice as long as planned (Lamberti et al.; subsequent CSDD impact reports continue to track underperformance at similar levels). Up to $500,000 per day in unrealized drug sales and $40,000 per day in direct trial costs, chronic site underperformance is one of the most consequential cost drivers in drug development. That combined underperformance rate has remained essentially flat for at least two decades. The tools are not the problem. The architecture is.
Clinical operations teams do not need more dashboards connected to existing systems. They need their decision-support applications to live where their data and models live — so that the feedback loop between a prediction and the operational outcome that validates it actually closes.
The Architecture Argument
The conventional approach to clinical decision-support looks like this: analytical data lives in a data warehouse or Lakehouse. A separate application database holds operational state. A pipeline keeps them loosely synchronized. A web application sits in front of both, adding semantic harmonization in the Silver layer. Every layer introduces integration overhead, credential surface area, and a synchronization lag that erodes trust in the data the application shows.
Databricks Apps, Lakebase, and AI/BI Genie eliminate each of those layers — not by abstracting them away but by making them unnecessary.
Databricks Apps run the web application inside the workspace. The app authenticates as a first-class workspace service principal, queries Unity Catalog via the SQL Statement API, and calls AI/BI Genie over the workspace REST API — all on internal connections. Clinical operations data never crosses a workspace boundary. The app inherits Unity Catalog access controls without any additional configuration.
Lakebase is the operational database layer — managed PostgreSQL that scales to zero when idle, provisioned and credentialed entirely within the workspace identity system. Where a traditional application would require a separately managed RDS instance with its own schema drift, sync jobs, and credential rotation, Lakebase is in the same platform where the data and models live.
AI/BI Genie closes the last gap: natural language access to governed data, embedded directly in the application workflow. Study managers ask questions in plain English against the same Unity Catalog tables the ML models trained on, with the same access controls applied.
The result is a clinical operations application that makes no external API calls, maintains no separate operational database infrastructure, and requires no synchronization pipeline between the analytical and operational layers.

Expand
Figure 1 — The Databricks Lakehouse Platform as a unified clinical intelligence stack. External sources ingest via Lakeflow (Bronze → Silver → Gold). Mosaic AI trains AI/ML models and writes versioned predictions back to Unity Catalog. SQL Warehouse, Lakebase, and AI/BI Genie serve the Databricks App — which runs inside the platform boundary with all connections internal.
The Auditability Argument
The standard industry approach to site feasibility relies on commercial scoring products from vendors or CRO-provided analytics platforms. Those tools are built on aggregated industry data — useful as a baseline, but blind to the specifics of your portfolio. A sponsor with a decade of CTMS, EDC, and IRT history carries significant signals about how their sites perform on their protocols.
When the ML stack lives on Databricks, that institutional knowledge becomes the training data. The models in this workbench are trained on your historical enrollment rates, your site qualification history, your screen failure patterns, and your protocol execution record — not industry averages. CMS Open Payments adds a public signal layer that, when used appropriately, correlates with research engagement and infrastructure and it is freely available. As the trial portfolio grows, the models improve on the same infrastructure. That is the compounding return that a single-platform architecture enables and that a licensed scoring product cannot: every new study makes the prediction better, and every new site relationship is reflected in the next training run. MLflow tracks every model training run, parameters, metrics, and artifact — enabling comparison across model versions, reproducibility on demand, and a complete audit trail from raw CTMS and EDC records to deployed prediction.
The regulatory dimension matters here too. 21 CFR Part 11, ICH E6(R3)_Step4_FinalGuideline_2025_0106.pdf), and FDA's Good Machine Learning Practice (GMLP) guidance, along with increasing FDA emphasis on transparency in algorithmic decision support, make model explainability and data governance material considerations, not optional features. Because every prediction carries a SHAP attribution stored as a governed Unity Catalog Delta table — versioned in MLflow, lineaged through Unity Catalog, queryable — the rationale behind a site selection is as auditable as the score itself. A clinical affairs team can answer a question from a data monitoring committee with a SQL query, not a black-box vendor report.
What We Built
The Site Feasibility Workbench is a six-step guided workflow for clinical trial site selection: protocol selection, score constraints, geographic overview, site ranking, SHAP-driven site deep dive, and final shortlist. Diversity considerations are a first-class scoring dimension, aligned with FDA's Diversity Action Plan expectations under FDORA 2022.
Composite feasibility scores combine real-world evidence, patient access data, historical site performance, site qualification history, Open Payments KOL signal, and protocol execution factors — all driven by TA-segmented LightGBM models trained on the organization's own CTMS, EDC, and IRT history.
The part worth emphasizing is not the workflow steps or the model features. Patient-level data inherits Unity Catalog access controls & PHI handling follows the sponsor's HIPAA Safe Harbor / Expert Determination posture configured at the catalog or schema level.
It is what the architecture makes possible: every prediction carries a SHAP explanation stored as a governed Delta table alongside the prediction itself, making the model rationale as auditable and versioned as the score it explains. Because every prediction is decomposed into governed SHAP attributions, sponsors can audit recommendations for systematic under-weighting of community sites, minority-serving institutions, or first-time investigators — turning explainability into a fairness control.
Saved shortlists persist to Lakebase for team sharing. The AI/BI Genie assistant answers cross-domain questions against the same Unity Catalog tables in natural language. None of this requires infrastructure outside the workspace.
This is a decision-support layer, not a source-of-record system. The CTMS/EDC/IRT remain authoritative. The workbench produces predictions whose lineage is governed in Unity Catalog and MLflow.

Expand
Figure 2 — Site Feasibility Workbench - A stateful, workflow application for site feasibility leads to create and share data-driven site selection shortlists leveraging RWD & AI.
The full application — FastAPI backend, React frontend, seed notebooks, and deploy scripts — is published as an open-source repository. Deploying into an existing Databricks workspace with Unity Catalog takes approximately 30 minutes of technical deployment time, before sponsor-specific security review and validation.
One Module of a Larger Platform
The Site Feasibility Workbench is the first public release of a broader architecture — the Databricks Clinical Operations Intelligence Hub — covering the full trial lifecycle:
- Site Feasibility and Selection — what this repository covers
- Patient Cohort and Recruitment — protocol-aligned cohort building from EHR and real-world evidence at Lakehouse scale
- Enrollment Velocity Optimizer — ML stall prediction per site per month with a 1–3 month forward horizon
- Risk-Based Monitoring and Compliance — continuous monitoring for enrollment anomalies, data lags, and protocol deviations
All four deploy as Databricks Apps. All four query Unity Catalog directly. None make external API calls. When clinical applications live where your data and models live, the feedback loop closes. Site selection models learn from enrollment outcomes. Risk scores update as amendment history grows. Every AI-driven recommendation carries a lineage trail back to the CTMS, EDC, and IRT records that produced it.
Get Started
Clone the public repository. Deploy. Tell us what you change.
For the full Clinical Operations Intelligence Hub — watch the BrickTalk recording: Scaling BioPharma Intelligence + Databricks Agentic Clinical Ops.
Lakebase and Databricks Apps in production cover the platform primitives in depth.
This post is part of the Databricks Clinical Operations Intelligence Hub series — a set of open-source Databricks Apps covering the full trial lifecycle. Start with the GitHub repository for the Site Feasibility Workbench. For the full platform overview, watch the BrickTalk: Scaling BioPharma Intelligence + Databricks Agentic Clinical Ops. Explore the related platform posts on Lakebase and Databricks Apps below.
Get the latest posts in your inbox
Subscribe to our blog and get the latest posts delivered to your inbox.
Sign up
*
Work Email
*
Country Country*
By clicking “Subscribe” I understand that I will receive Databricks communications, and I agree to Databricks processing my personal data in accordance with its Privacy Policy.
Subscribe

Why Databricks
Discover
Customers
Partners
Why Databricks
Discover
Customers
Partners
Product
Databricks Platform
- Platform Overview
- Sharing
- Governance
- Artificial Intelligence
- Business Intelligence
- Database
- Data Management
- Data Warehousing
- Data Engineering
- Business Productivity
- Application Development
- Security
Pricing
Integrations and Data
Product
Databricks Platform
- Platform Overview
- Sharing
- Governance
- Artificial Intelligence
- Business Intelligence
- Database
- Data Management
- Data Warehousing
- Data Engineering
- Business Productivity
- Application Development
- Security
Pricing
Open Source
Integrations and Data
Solutions
Databricks For Industries
- Communications
- Financial Services
- Healthcare and Life Sciences
- Manufacturing
- Media and Entertainment
- Public Sector
- Retail
- View All
Cross Industry Solutions
Solutions
Databricks For Industries
- Communications
- Financial Services
- Healthcare and Life Sciences
- Manufacturing
- Media and Entertainment
- Public Sector
- Retail
- View All
Cross Industry Solutions
Data Migration
Professional Services
Solution Accelerators
Resources
Learning
Events
Blog and Podcasts
Resources
Documentation
Customer Support
Community
Learning
Events
Blog and Podcasts
About
Company
Careers
Press
About
Company
Careers
Press
Security and Trust

Databricks Inc.
160 Spear Street, 15th Floor
San Francisco, CA 94105
1-866-330-0121
- [](https://www.linkedin.com/company/databricks)
- [](https://www.facebook.com/pages/Databricks/560203607379694)
- [](https://twitter.com/databricks)
- [](https://www.databricks.com/feed)
- [](https://www.glassdoor.com/Overview/Working-at-Databricks-EI_IE954734.11,21.htm)
- [](https://www.youtube.com/@Databricks)

- [](https://www.linkedin.com/company/databricks)
- [](https://www.facebook.com/pages/Databricks/560203607379694)
- [](https://twitter.com/databricks)
- [](https://www.databricks.com/feed)
- [](https://www.glassdoor.com/Overview/Working-at-Databricks-EI_IE954734.11,21.htm)
- [](https://www.youtube.com/@Databricks)
© Databricks 2026. All rights reserved. Apache, Apache Spark, Spark, the Spark Logo, Apache Iceberg, Iceberg, and the Apache Iceberg logo are trademarks of the Apache Software Foundation.
We Care About Your Privacy
Databricks uses cookies and similar technologies to enhance site navigation, analyze site usage, personalize content and ads, and as further described in our Cookie Notice. To disable non-essential cookies, click “Reject All”. You can also manage your cookie settings by clicking “Manage Preferences.”
Manage Preferences
Reject All Accept All

Privacy Preference Center
Opt-Out Preference Signal Honored
Privacy Preference Center
- ### Your Privacy
- ### Strictly Necessary Cookies
- ### Performance Cookies
- ### Functional Cookies
- ### Targeting Cookies
- ### TOTHR
#### Your Privacy
When you visit any website, it may store or retrieve information on your browser, mostly in the form of cookies. This information might be about you, your preferences or your device and is mostly used to make the site work as you expect it to. The information does not usually directly identify you, but it can give you a more personalized web experience. Because we respect your right to privacy, you can choose not to allow some types of cookies. Click on the different category headings to find out more and change our default settings. However, blocking some types of cookies may impact your experience of the site and the services we are able to offer.
#### Opting out of sales, sharing, and targeted advertising
Depending on your location, you may have the right to opt out of the “sale” or “sharing” of your personal information or the processing of your personal information for purposes of online “targeted advertising.” You can opt out based on cookies and similar identifiers by disabling optional cookies here. To opt out based on other identifiers (such as your email address), submit a request in our Privacy Request Center.
#### Strictly Necessary Cookies
Always Active
These cookies are necessary for the website to function and cannot be switched off in our systems. They assist with essential site functionality such as setting your privacy preferences, logging in or filling in forms. You can set your browser to block or alert you about these cookies, but some parts of the site will no longer work.
#### Performance Cookies
- [x] Performance Cookies
These cookies allow us to count visits and traffic sources so we can measure and improve the performance of our site. They help us to know which pages are the most and least popular and see how visitors move around the site.
#### Functional Cookies
- [x] Functional Cookies
These cookies enable the website to provide enhanced functionality and personalization. They may be set by us or by third party providers whose services we have added to our pages. If you do not allow these cookies then some or all of these services may not function properly.
#### Targeting Cookies
- [x] Targeting Cookies
These cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant advertisements on other sites. If you do not allow these cookies, you will experience less targeted advertising.
#### TOTHR
- [x] TOTHR
Cookie List
Consent Leg.Interest
- [x] checkbox label label
- [x] checkbox label label
- [x] checkbox label label
Clear
- - [x] checkbox label label
Apply Cancel
Confirm My Choices
Allow All