临床运营智能应基于湖库架构

Databricks

Databricks2026年5月13日

Clinical operations intelligence belongs on the Lakehouse

7.5Score

TL;DR · AI Summary

Clinical operations intelligence should be based on the Lakehouse architecture to improve data processing efficiency and analytical capabilities.

Key Takeaways

The Lakehouse architecture can integrate and optimize medical data processing.
Clinical operations intelligence requires real-time data processing and large-sc
Databricks provides powerful tools to support clinical operations intelligence i

Outline

Jump quickly between sections.

§Introduction
Introduce the importance of clinical operations intelligence and its challenges.
·Advantages of Lakehouse Architecture
The Lakehouse architecture can integrate multiple data sources and provide a unified data management platform.
·Requirements for Clinical Operations Intelligence
Emphasize the need for real-time data processing and large-scale data analysis.
·Databricks' Solutions
Databricks offers various tools and services to support clinical operations intelligence in the Lakehouse architecture.
·Case Studies
Present actual case studies demonstrating the application effects of the Lakehouse architecture in clinical operations intelligence.

Mindmap

See how the topics connect at a glance.

查看大纲文本（无障碍 / 无 JS 友好）

Clinical operations intelligence on the Lakehouse
- Lakehouse 架构优势
  - 整合多种数据源
  - 提供统一的数据管理平台
- 临床运营智能需求
  - 实时数据处理
  - 大规模数据分析

Highlights

Key sentences worth saving and sharing.

The Lakehouse architecture can integrate and optimize medical data processing, improving analytical efficiency.
— Paragraph 2
⬇︎ 下载 PNG 𝕏 分享到 X
Clinical operations intelligence requires real-time data processing and large-scale data analysis capabilities.
— Paragraph 3
⬇︎ 下载 PNG 𝕏 分享到 X
Databricks provides powerful tools to support clinical operations intelligence in the Lakehouse architecture.
— Paragraph 5
⬇︎ 下载 PNG 𝕏 分享到 X

#Lakehouse#Clinical operations intelligence#Databricks

Open original article

Clinical operations intelligence belongs on the Lakehouse | Databricks Blog

Skip to main content

[![Image 1](blob:http://localhost/c3d26385bd032c882a09c45135533626)](https://www.databricks.com/)

[![Image 2](blob:http://localhost/c3d26385bd032c882a09c45135533626)](https://www.databricks.com/)

Why Databricks

* Discover

For App Developers

For Executives

For Startups

Lakehouse Architecture

Databricks AI Research

Customers

Customer Stories

Partners

Partner Overview Explore the Databricks partner ecosystem

Partner Program Explore benefits, tiers and how to become a partner

Find a Partner Discover Databricks partners for your needs

Partner Spotlight Featured partner announcements

Cloud Providers Databricks on AWS, Azure and GCP

Partner Solutions Find custom industry and migration solutions

Product

* Databricks Platform

Platform Overview A unified platform for data, analytics and AI

Sharing Open, secure, zero-copy sharing for all data

Governance Unified governance for all data, analytics and AI assets

Artificial Intelligence Build and deploy ML and GenAI applications

Business Intelligence Intelligent analytics for real-world data

Database Postgres for data apps and AI agents

Data Management Data reliability, security and performance

Data Warehousing Serverless data warehouse for SQL analytics

Data Engineering ETL and orchestration for batch and streaming data

Business Productivity Unified search, chat, dashboards and apps

Application Development Quickly build secure data and AI apps

Security Open agentic SIEM built for the AI era

Integrations and Data

Marketplace Open marketplace for data, analytics and AI

IDE Integrations Build on the Lakehouse in your favorite IDE

Partner Connect Discover and integrate with the Databricks ecosystem

Pricing

Databricks Pricing Explore product pricing, DBUs and more

Cost Calculator Estimate your compute costs on any cloud

Open Source

Open Source Technologies Learn more about the innovations behind the platform

Solutions

* Databricks for Industries

Communications

Financial Services

Healthcare & Life Sciences

Manufacturing

Media and Entertainment

Public Sector

Retail

See All Industries

Cross Industry Solutions

AI Agents

AI Governance

Cybersecurity

Marketing

Migration & Deployment

Data Migration

Professional Services

Solution Accelerators

Explore Accelerators Move faster toward outcomes that matter

Resources

* Learning

Training Discover curriculum tailored to your needs

Databricks Academy Sign in to the Databricks learning platform

Certification Gain recognition and differentiation

Free Edition Learn professional Data and AI tools for free

University Alliance Want to teach Databricks? See how.

Events

Data + AI Summit

Data + AI World Tour

AI Days

Event Calendar

Blog and Podcasts

Databricks Blog Explore news, product announcements, and more

AI Blog Explore our AI research and engineering work

Data Brew Podcast Let’s talk data!

Champions of Data + AI Podcast Insights from data leaders powering innovation

Get Help

Customer Support

Documentation

Community

Dive Deep

Resource Center

Demo Center

Architecture Center

About

* Company

Who We Are

Our Team

Databricks Ventures

Contact Us

Careers

Working at Databricks

Open Jobs

Press

Awards and Recognition

Newsroom

Security and Trust

Security and Trust

DATA + AI SUMMIT ![Image 3: Data+ai summit promo JUNE 15–18|SAN FRANCISCO Join us at the world’s largest data, apps and AI event. Register](https://www.databricks.com/dataaisummit?itm_source=www&itm_category=home&itm_page=home&itm_location=navigation&itm_component=navigation&itm_offer=dataaisummit)

Table of contents

IndustriesMay 13, 2026

Clinical operations intelligence belongs on the Lakehouse

How Databricks Apps, Lakebase, and AI/BI Genie eliminate the integration stack between clinical data and decision-support applications — and why that architecture change is what clinical operations have been missing.

by Nicholas Siebenlist

Summary

What it is: The Site Feasibility Workbench is an open-source Databricks App that runs clinical trial site selection entirely within the Databricks workspace — combining ML-driven site scoring, Lakebase for operational state, and AI/BI Genie for natural language data access, with no external API calls or synchronization pipelines.
The challenge it solves: 37% of investigator sites miss enrollment targets, and the root cause is architectural — clinical operations data and the applications that use it live in disconnected systems, forcing decisions into spreadsheets and creating integration overhead, credential sprawl, and synchronization lag that erodes trust in the data.
Results and outcomes: TA-segmented LightGBM models trained on your own CTMS, EDC, and IRT history — not industry averages — produce scores that improve as your portfolio grows, with SHAP-driven explanations stored as governed, versioned Delta tables. Every prediction carries SHAP-driven attribution stored as a governed Delta table, making model rationale as auditable and versioned as the score itself.

The clinical data problem is not a storage problem. Most organizations already have a data warehouse, a CTMS, an EDC, and somewhere downstream, a BI layer. The problem is that none of these systems talk to each other in a way that supports the actual decisions clinical teams need to make — and so the decisions get made in spreadsheets instead.

Today we are releasing the Site Feasibility Workbench as a fully open-source Databricks App — to show what clinical operations intelligence looks like when the application, the models, and the data live on the same platform. The Tufts Center for the Study of Drug Development has documented that 37% of activated investigator sites enrolled fewer patients than their targets, and an additional 11% enrolled no patients at all — the combined effect being that 53% of trials exceeded their planned enrollment timelines, with one in six taking more than twice as long as planned (Lamberti et al.; subsequent CSDD impact reports continue to track underperformance at similar levels). Up to $500,000 per day in unrealized drug sales and $40,000 per day in direct trial costs, chronic site underperformance is one of the most consequential cost drivers in drug development. That combined underperformance rate has remained essentially flat for at least two decades. The tools are not the problem. The architecture is.

Clinical operations teams do not need more dashboards connected to existing systems. They need their decision-support applications to live where their data and models live — so that the feedback loop between a prediction and the operational outcome that validates it actually closes.

The Architecture Argument

The conventional approach to clinical decision-support looks like this: analytical data lives in a data warehouse or Lakehouse. A separate application database holds operational state. A pipeline keeps them loosely synchronized. A web application sits in front of both, adding semantic harmonization in the Silver layer. Every layer introduces integration overhead, credential surface area, and a synchronization lag that erodes trust in the data the application shows.

Databricks Apps, Lakebase, and AI/BI Genie eliminate each of those layers — not by abstracting them away but by making them unnecessary.

Databricks Apps run the web application inside the workspace. The app authenticates as a first-class workspace service principal, queries Unity Catalog via the SQL Statement API, and calls AI/BI Genie over the workspace REST API — all on internal connections. Clinical operations data never crosses a workspace boundary. The app inherits Unity Catalog access controls without any additional configuration.

Lakebase is the operational database layer — managed PostgreSQL that scales to zero when idle, provisioned and credentialed entirely within the workspace identity system. Where a traditional application would require a separately managed RDS instance with its own schema drift, sync jobs, and credential rotation, Lakebase is in the same platform where the data and models live.

AI/BI Genie closes the last gap: natural language access to governed data, embedded directly in the application workflow. Study managers ask questions in plain English against the same Unity Catalog tables the ML models trained on, with the same access controls applied.

The result is a clinical operations application that makes no external API calls, maintains no separate operational database infrastructure, and requires no synchronization pipeline between the analytical and operational layers.

Image 4: Databricks Lakehouse Platform Figure 1

Expand

Figure 1 — The Databricks Lakehouse Platform as a unified clinical intelligence stack. External sources ingest via Lakeflow (Bronze → Silver → Gold). Mosaic AI trains AI/ML models and writes versioned predictions back to Unity Catalog. SQL Warehouse, Lakebase, and AI/BI Genie serve the Databricks App — which runs inside the platform boundary with all connections internal.

The Auditability Argument

The standard industry approach to site feasibility relies on commercial scoring products from vendors or CRO-provided analytics platforms. Those tools are built on aggregated industry data — useful as a baseline, but blind to the specifics of your portfolio. A sponsor with a decade of CTMS, EDC, and IRT history carries significant signals about how their sites perform on their protocols.

When the ML stack lives on Databricks, that institutional knowledge becomes the training data. The models in this workbench are trained on your historical enrollment rates, your site qualification history, your screen failure patterns, and your protocol execution record — not industry averages. CMS Open Payments adds a public signal layer that, when used appropriately, correlates with research engagement and infrastructure and it is freely available. As the trial portfolio grows, the models improve on the same infrastructure. That is the compounding return that a single-platform architecture enables and that a licensed scoring product cannot: every new study makes the prediction better, and every new site relationship is reflected in the next training run. MLflow tracks every model training run, parameters, metrics, and artifact — enabling comparison across model versions, reproducibility on demand, and a complete audit trail from raw CTMS and EDC records to deployed prediction.

The regulatory dimension matters here too. 21 CFR Part 11, ICH E6(R3)_Step4_FinalGuideline_2025_0106.pdf), and FDA's Good Machine Learning Practice (GMLP) guidance, along with increasing FDA emphasis on transparency in algorithmic decision support, make model explainability and data governance material considerations, not optional features. Because every prediction carries a SHAP attribution stored as a governed Unity Catalog Delta table — versioned in MLflow, lineaged through Unity Catalog, queryable — the rationale behind a site selection is as auditable as the score itself. A clinical affairs team can answer a question from a data monitoring committee with a SQL query, not a black-box vendor report.

What We Built

The Site Feasibility Workbench is a six-step guided workflow for clinical trial site selection: protocol selection, score constraints, geographic overview, site ranking, SHAP-driven site deep dive, and final shortlist. Diversity considerations are a first-class scoring dimension, aligned with FDA's Diversity Action Plan expectations under FDORA 2022.

Composite feasibility scores combine real-world evidence, patient access data, historical site performance, site qualification history, Open Payments KOL signal, and protocol execution factors — all driven by TA-segmented LightGBM models trained on the organization's own CTMS, EDC, and IRT history.

The part worth emphasizing is not the workflow steps or the model features. Patient-level data inherits Unity Catalog access controls & PHI handling follows the sponsor's HIPAA Safe Harbor / Expert Determination posture configured at the catalog or schema level.

It is what the architecture makes possible: every prediction carries a SHAP explanation stored as a governed Delta table alongside the prediction itself, making the model rationale as auditable and versioned as the score it explains. Because every prediction is decomposed into governed SHAP attributions, sponsors can audit recommendations for systematic under-weighting of community sites, minority-serving institutions, or first-time investigators — turning explainability into a fairness control.

Saved shortlists persist to Lakebase for team sharing. The AI/BI Genie assistant answers cross-domain questions against the same Unity Catalog tables in natural language. None of this requires infrastructure outside the workspace.

This is a decision-support layer, not a source-of-record system. The CTMS/EDC/IRT remain authoritative. The workbench produces predictions whose lineage is governed in Unity Catalog and MLflow.

Image 5: Clinical Trial Site Feasibility Workbench Figure 2

Expand

Figure 2 — Site Feasibility Workbench - A stateful, workflow application for site feasibility leads to create and share data-driven site selection shortlists leveraging RWD & AI.

The full application — FastAPI backend, React frontend, seed notebooks, and deploy scripts — is published as an open-source repository. Deploying into an existing Databricks workspace with Unity Catalog takes approximately 30 minutes of technical deployment time, before sponsor-specific security review and validation.

One Module of a Larger Platform

The Site Feasibility Workbench is the first public release of a broader architecture — the Databricks Clinical Operations Intelligence Hub — covering the full trial lifecycle:

Site Feasibility and Selection — what this repository covers
Patient Cohort and Recruitment — protocol-aligned cohort building from EHR and real-world evidence at Lakehouse scale
Enrollment Velocity Optimizer — ML stall prediction per site per month with a 1–3 month forward horizon
Risk-Based Monitoring and Compliance — continuous monitoring for enrollment anomalies, data lags, and protocol deviations

All four deploy as Databricks Apps. All four query Unity Catalog directly. None make external API calls. When clinical applications live where your data and models live, the feedback loop closes. Site selection models learn from enrollment outcomes. Risk scores update as amendment history grows. Every AI-driven recommendation carries a lineage trail back to the CTMS, EDC, and IRT records that produced it.

Get Started

Clone the public repository. Deploy. Tell us what you change.

GitHub Repository link

For the full Clinical Operations Intelligence Hub — watch the BrickTalk recording: Scaling BioPharma Intelligence + Databricks Agentic Clinical Ops.

Lakebase and Databricks Apps in production cover the platform primitives in depth.

This post is part of the Databricks Clinical Operations Intelligence Hub series — a set of open-source Databricks Apps covering the full trial lifecycle. Start with the GitHub repository for the Site Feasibility Workbench. For the full platform overview, watch the BrickTalk: Scaling BioPharma Intelligence + Databricks Agentic Clinical Ops. Explore the related platform posts on Lakebase and Databricks Apps below.

Get the latest posts in your inbox

Subscribe to our blog and get the latest posts delivered to your inbox.

Sign up

*

Work Email

*

Country Country*

By clicking “Subscribe” I understand that I will receive Databricks communications, and I agree to Databricks processing my personal data in accordance with its Privacy Policy.

Subscribe

View all blogs

Why Databricks

Discover

Customers

Customer Stories

Partners

Why Databricks

Discover

Customers

Customer Stories

Partners

Product

Databricks Platform

Pricing

Open Source

Integrations and Data

Product

Databricks Platform

Pricing

Open Source

Integrations and Data

Solutions

Databricks For Industries

Cross Industry Solutions

Data Migration

Professional Services

Solution Accelerators

Solutions

Databricks For Industries

Cross Industry Solutions

Data Migration

Professional Services

Solution Accelerators

Resources

Documentation

Customer Support

Community

Learning

Events

Blog and Podcasts

Resources

Documentation

Customer Support

Community

Learning

Events

Blog and Podcasts

About

Company

Careers

Press

Security and Trust

About

Company

Careers

Press

Security and Trust

Databricks Inc.

160 Spear Street, 15th Floor

San Francisco, CA 94105

1-866-330-0121

[](https://www.linkedin.com/company/databricks)
[](https://www.facebook.com/pages/Databricks/560203607379694)
[](https://twitter.com/databricks)
[](https://www.databricks.com/feed)
[](https://www.glassdoor.com/Overview/Working-at-Databricks-EI_IE954734.11,21.htm)
[](https://www.youtube.com/@Databricks)

See Careers

at Databricks

[](https://www.linkedin.com/company/databricks)
[](https://www.facebook.com/pages/Databricks/560203607379694)
[](https://twitter.com/databricks)
[](https://www.databricks.com/feed)
[](https://www.glassdoor.com/Overview/Working-at-Databricks-EI_IE954734.11,21.htm)
[](https://www.youtube.com/@Databricks)

We Care About Your Privacy

Databricks uses cookies and similar technologies to enhance site navigation, analyze site usage, personalize content and ads, and as further described in our Cookie Notice. To disable non-essential cookies, click “Reject All”. You can also manage your cookie settings by clicking “Manage Preferences.”

Manage Preferences

Reject All Accept All

Privacy Preference Center

Opt-Out Preference Signal Honored

Privacy Preference Center

### Your Privacy
### Strictly Necessary Cookies
### Performance Cookies
### Functional Cookies
### Targeting Cookies
### TOTHR

#### Your Privacy

When you visit any website, it may store or retrieve information on your browser, mostly in the form of cookies. This information might be about you, your preferences or your device and is mostly used to make the site work as you expect it to. The information does not usually directly identify you, but it can give you a more personalized web experience. Because we respect your right to privacy, you can choose not to allow some types of cookies. Click on the different category headings to find out more and change our default settings. However, blocking some types of cookies may impact your experience of the site and the services we are able to offer.

#### Opting out of sales, sharing, and targeted advertising

Depending on your location, you may have the right to opt out of the “sale” or “sharing” of your personal information or the processing of your personal information for purposes of online “targeted advertising.” You can opt out based on cookies and similar identifiers by disabling optional cookies here. To opt out based on other identifiers (such as your email address), submit a request in our Privacy Request Center.

More information

#### Strictly Necessary Cookies

Always Active

These cookies are necessary for the website to function and cannot be switched off in our systems. They assist with essential site functionality such as setting your privacy preferences, logging in or filling in forms. You can set your browser to block or alert you about these cookies, but some parts of the site will no longer work.

#### Performance Cookies

[x] Performance Cookies

These cookies allow us to count visits and traffic sources so we can measure and improve the performance of our site. They help us to know which pages are the most and least popular and see how visitors move around the site.

#### Functional Cookies

[x] Functional Cookies

These cookies enable the website to provide enhanced functionality and personalization. They may be set by us or by third party providers whose services we have added to our pages. If you do not allow these cookies then some or all of these services may not function properly.

#### Targeting Cookies

[x] Targeting Cookies

These cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant advertisements on other sites. If you do not allow these cookies, you will experience less targeted advertising.

#### TOTHR

[x] TOTHR

Cookie List

Consent Leg.Interest

[x] checkbox label label

[x] checkbox label label

[x] checkbox label label

Clear

- [x] checkbox label label

Apply Cancel

Confirm My Choices

Allow All