T
traeai
Sign in
返回首页
Databricks

Clinical operations intelligence belongs on the Lakehouse

7.5Score
Clinical operations intelligence belongs on the Lakehouse

TL;DR · AI Summary

Clinical operations intelligence should be based on the Lakehouse architecture to improve data processing efficiency and analytical capabilities.

Key Takeaways

  • The Lakehouse architecture can integrate and optimize medical data processing.
  • Clinical operations intelligence requires real-time data processing and large-sc
  • Databricks provides powerful tools to support clinical operations intelligence i

Outline

Jump quickly between sections.

  1. Introduce the importance of clinical operations intelligence and its challenges.

  2. The Lakehouse architecture can integrate multiple data sources and provide a unified data management platform.

  3. Emphasize the need for real-time data processing and large-scale data analysis.

  4. ·Databricks' Solutions

    Databricks offers various tools and services to support clinical operations intelligence in the Lakehouse architecture.

  5. Present actual case studies demonstrating the application effects of the Lakehouse architecture in clinical operations intelligence.

Mindmap

See how the topics connect at a glance.

查看大纲文本(无障碍 / 无 JS 友好)
  • Clinical operations intelligence on the Lakehouse
    • Lakehouse 架构优势
      • 整合多种数据源
      • 提供统一的数据管理平台
    • 临床运营智能需求
      • 实时数据处理
      • 大规模数据分析

Highlights

Key sentences worth saving and sharing.

  • The Lakehouse architecture can integrate and optimize medical data processing, improving analytical efficiency.

    Paragraph 2

    ⬇︎ 下载 PNG𝕏 分享到 X
  • Clinical operations intelligence requires real-time data processing and large-scale data analysis capabilities.

    Paragraph 3

    ⬇︎ 下载 PNG𝕏 分享到 X
  • Databricks provides powerful tools to support clinical operations intelligence in the Lakehouse architecture.

    Paragraph 5

    ⬇︎ 下载 PNG𝕏 分享到 X
#Lakehouse#Clinical operations intelligence#Databricks
Open original article

Clinical operations intelligence belongs on the Lakehouse | Databricks Blog

Skip to main content

[![Image 1](blob:http://localhost/c3d26385bd032c882a09c45135533626)](https://www.databricks.com/)

[![Image 2](blob:http://localhost/c3d26385bd032c882a09c45135533626)](https://www.databricks.com/)

  • Why Databricks
  • * Discover
  • Customers
  • Partners
  • Product
  • * Databricks Platform
  • Integrations and Data
  • Pricing
  • Open Source
  • Solutions
  • * Databricks for Industries
  • Cross Industry Solutions
  • Migration & Deployment
  • Solution Accelerators
  • Resources
  • * Learning
  • Events
  • Blog and Podcasts
  • Get Help
  • Dive Deep
  • About
  • * Company
  • Careers
  • Press
  • Security and Trust
  • DATA + AI SUMMIT ![Image 3: Data+ai summit promo JUNE 15–18|SAN FRANCISCO Join us at the world’s largest data, apps and AI event. Register](https://www.databricks.com/dataaisummit?itm_source=www&itm_category=home&itm_page=home&itm_location=navigation&itm_component=navigation&itm_offer=dataaisummit)
  1. All blogs
  2. / Industries

Table of contents

Table of contents

Table of contents

IndustriesMay 13, 2026

Clinical operations intelligence belongs on the Lakehouse

How Databricks Apps, Lakebase, and AI/BI Genie eliminate the integration stack between clinical data and decision-support applications — and why that architecture change is what clinical operations have been missing.

by Nicholas Siebenlist

Summary

  • What it is: The Site Feasibility Workbench is an open-source Databricks App that runs clinical trial site selection entirely within the Databricks workspace — combining ML-driven site scoring, Lakebase for operational state, and AI/BI Genie for natural language data access, with no external API calls or synchronization pipelines.
  • The challenge it solves: 37% of investigator sites miss enrollment targets, and the root cause is architectural — clinical operations data and the applications that use it live in disconnected systems, forcing decisions into spreadsheets and creating integration overhead, credential sprawl, and synchronization lag that erodes trust in the data.
  • Results and outcomes: TA-segmented LightGBM models trained on your own CTMS, EDC, and IRT history — not industry averages — produce scores that improve as your portfolio grows, with SHAP-driven explanations stored as governed, versioned Delta tables. Every prediction carries SHAP-driven attribution stored as a governed Delta table, making model rationale as auditable and versioned as the score itself.

The clinical data problem is not a storage problem. Most organizations already have a data warehouse, a CTMS, an EDC, and somewhere downstream, a BI layer. The problem is that none of these systems talk to each other in a way that supports the actual decisions clinical teams need to make — and so the decisions get made in spreadsheets instead.

Today we are releasing the Site Feasibility Workbench as a fully open-source Databricks App — to show what clinical operations intelligence looks like when the application, the models, and the data live on the same platform. The Tufts Center for the Study of Drug Development has documented that 37% of activated investigator sites enrolled fewer patients than their targets, and an additional 11% enrolled no patients at all — the combined effect being that 53% of trials exceeded their planned enrollment timelines, with one in six taking more than twice as long as planned (Lamberti et al.; subsequent CSDD impact reports continue to track underperformance at similar levels). Up to $500,000 per day in unrealized drug sales and $40,000 per day in direct trial costs, chronic site underperformance is one of the most consequential cost drivers in drug development. That combined underperformance rate has remained essentially flat for at least two decades. The tools are not the problem. The architecture is.

Clinical operations teams do not need more dashboards connected to existing systems. They need their decision-support applications to live where their data and models live — so that the feedback loop between a prediction and the operational outcome that validates it actually closes.

The Architecture Argument

The conventional approach to clinical decision-support looks like this: analytical data lives in a data warehouse or Lakehouse. A separate application database holds operational state. A pipeline keeps them loosely synchronized. A web application sits in front of both, adding semantic harmonization in the Silver layer. Every layer introduces integration overhead, credential surface area, and a synchronization lag that erodes trust in the data the application shows.

Databricks Apps, Lakebase, and AI/BI Genie eliminate each of those layers — not by abstracting them away but by making them unnecessary.

Databricks Apps run the web application inside the workspace. The app authenticates as a first-class workspace service principal, queries Unity Catalog via the SQL Statement API, and calls AI/BI Genie over the workspace REST API — all on internal connections. Clinical operations data never crosses a workspace boundary. The app inherits Unity Catalog access controls without any additional configuration.

Lakebase is the operational database layer — managed PostgreSQL that scales to zero when idle, provisioned and credentialed entirely within the workspace identity system. Where a traditional application would require a separately managed RDS instance with its own schema drift, sync jobs, and credential rotation, Lakebase is in the same platform where the data and models live.

AI/BI Genie closes the last gap: natural language access to governed data, embedded directly in the application workflow. Study managers ask questions in plain English against the same Unity Catalog tables the ML models trained on, with the same access controls applied.

The result is a clinical operations application that makes no external API calls, maintains no separate operational database infrastructure, and requires no synchronization pipeline between the analytical and operational layers.

Image 4: Databricks Lakehouse Platform Figure 1

Expand

Figure 1 — The Databricks Lakehouse Platform as a unified clinical intelligence stack. External sources ingest via Lakeflow (Bronze → Silver → Gold). Mosaic AI trains AI/ML models and writes versioned predictions back to Unity Catalog. SQL Warehouse, Lakebase, and AI/BI Genie serve the Databricks App — which runs inside the platform boundary with all connections internal.

The Auditability Argument

The standard industry approach to site feasibility relies on commercial scoring products from vendors or CRO-provided analytics platforms. Those tools are built on aggregated industry data — useful as a baseline, but blind to the specifics of your portfolio. A sponsor with a decade of CTMS, EDC, and IRT history carries significant signals about how their sites perform on their protocols.

When the ML stack lives on Databricks, that institutional knowledge becomes the training data. The models in this workbench are trained on your historical enrollment rates, your site qualification history, your screen failure patterns, and your protocol execution record — not industry averages. CMS Open Payments adds a public signal layer that, when used appropriately, correlates with research engagement and infrastructure and it is freely available. As the trial portfolio grows, the models improve on the same infrastructure. That is the compounding return that a single-platform architecture enables and that a licensed scoring product cannot: every new study makes the prediction better, and every new site relationship is reflected in the next training run. MLflow tracks every model training run, parameters, metrics, and artifact — enabling comparison across model versions, reproducibility on demand, and a complete audit trail from raw CTMS and EDC records to deployed prediction.

The regulatory dimension matters here too. 21 CFR Part 11, ICH E6(R3)_Step4_FinalGuideline_2025_0106.pdf), and FDA's Good Machine Learning Practice (GMLP) guidance, along with increasing FDA emphasis on transparency in algorithmic decision support, make model explainability and data governance material considerations, not optional features. Because every prediction carries a SHAP attribution stored as a governed Unity Catalog Delta table — versioned in MLflow, lineaged through Unity Catalog, queryable — the rationale behind a site selection is as auditable as the score itself. A clinical affairs team can answer a question from a data monitoring committee with a SQL query, not a black-box vendor report.

What We Built

The Site Feasibility Workbench is a six-step guided workflow for clinical trial site selection: protocol selection, score constraints, geographic overview, site ranking, SHAP-driven site deep dive, and final shortlist. Diversity considerations are a first-class scoring dimension, aligned with FDA's Diversity Action Plan expectations under FDORA 2022.

Composite feasibility scores combine real-world evidence, patient access data, historical site performance, site qualification history, Open Payments KOL signal, and protocol execution factors — all driven by TA-segmented LightGBM models trained on the organization's own CTMS, EDC, and IRT history.

The part worth emphasizing is not the workflow steps or the model features. Patient-level data inherits Unity Catalog access controls & PHI handling follows the sponsor's HIPAA Safe Harbor / Expert Determination posture configured at the catalog or schema level.

It is what the architecture makes possible: every prediction carries a SHAP explanation stored as a governed Delta table alongside the prediction itself, making the model rationale as auditable and versioned as the score it explains. Because every prediction is decomposed into governed SHAP attributions, sponsors can audit recommendations for systematic under-weighting of community sites, minority-serving institutions, or first-time investigators — turning explainability into a fairness control.

Saved shortlists persist to Lakebase for team sharing. The AI/BI Genie assistant answers cross-domain questions against the same Unity Catalog tables in natural language. None of this requires infrastructure outside the workspace.

This is a decision-support layer, not a source-of-record system. The CTMS/EDC/IRT remain authoritative. The workbench produces predictions whose lineage is governed in Unity Catalog and MLflow.

Image 5: Clinical Trial Site Feasibility Workbench Figure 2

Expand

Figure 2 — Site Feasibility Workbench - A stateful, workflow application for site feasibility leads to create and share data-driven site selection shortlists leveraging RWD & AI.

The full application — FastAPI backend, React frontend, seed notebooks, and deploy scripts — is published as an open-source repository. Deploying into an existing Databricks workspace with Unity Catalog takes approximately 30 minutes of technical deployment time, before sponsor-specific security review and validation.

One Module of a Larger Platform

The Site Feasibility Workbench is the first public release of a broader architecture — the Databricks Clinical Operations Intelligence Hub — covering the full trial lifecycle:

  • Site Feasibility and Selection — what this repository covers
  • Patient Cohort and Recruitment — protocol-aligned cohort building from EHR and real-world evidence at Lakehouse scale
  • Enrollment Velocity Optimizer — ML stall prediction per site per month with a 1–3 month forward horizon
  • Risk-Based Monitoring and Compliance — continuous monitoring for enrollment anomalies, data lags, and protocol deviations

All four deploy as Databricks Apps. All four query Unity Catalog directly. None make external API calls. When clinical applications live where your data and models live, the feedback loop closes. Site selection models learn from enrollment outcomes. Risk scores update as amendment history grows. Every AI-driven recommendation carries a lineage trail back to the CTMS, EDC, and IRT records that produced it.

Get Started

Clone the public repository. Deploy. Tell us what you change.

GitHub Repository link

For the full Clinical Operations Intelligence Hub — watch the BrickTalk recording: Scaling BioPharma Intelligence + Databricks Agentic Clinical Ops.

Lakebase and Databricks Apps in production cover the platform primitives in depth.

This post is part of the Databricks Clinical Operations Intelligence Hub series — a set of open-source Databricks Apps covering the full trial lifecycle. Start with the GitHub repository for the Site Feasibility Workbench. For the full platform overview, watch the BrickTalk: Scaling BioPharma Intelligence + Databricks Agentic Clinical Ops. Explore the related platform posts on Lakebase and Databricks Apps below.

Get the latest posts in your inbox

Subscribe to our blog and get the latest posts delivered to your inbox.

Sign up

*

Work Email

*

Country Country*

By clicking “Subscribe” I understand that I will receive Databricks communications, and I agree to Databricks processing my personal data in accordance with its Privacy Policy.

Subscribe

View all blogs

Image 6: databricks logo

Why Databricks

Discover

Customers

Partners

Why Databricks

Discover

Customers

Partners

Product

Databricks Platform

Pricing

Open Source

Integrations and Data

Product

Databricks Platform

Pricing

Open Source

Integrations and Data

Solutions

Databricks For Industries

Cross Industry Solutions

Data Migration

Professional Services

Solution Accelerators

Solutions

Databricks For Industries

Cross Industry Solutions

Data Migration

Professional Services

Solution Accelerators

Resources

Documentation

Customer Support

Community

Learning

Events

Blog and Podcasts

Resources

Documentation

Customer Support

Community

Learning

Events

Blog and Podcasts

About

Company

Careers

Press

Security and Trust

About

Company

Careers

Press

Security and Trust

Image 8: databricks logo

Databricks Inc.

160 Spear Street, 15th Floor

San Francisco, CA 94105

1-866-330-0121

  • [](https://www.linkedin.com/company/databricks)
  • [](https://www.facebook.com/pages/Databricks/560203607379694)
  • [](https://twitter.com/databricks)
  • [](https://www.databricks.com/feed)
  • [](https://www.glassdoor.com/Overview/Working-at-Databricks-EI_IE954734.11,21.htm)
  • [](https://www.youtube.com/@Databricks)
Image 10

See Careers

at Databricks

  • [](https://www.linkedin.com/company/databricks)
  • [](https://www.facebook.com/pages/Databricks/560203607379694)
  • [](https://twitter.com/databricks)
  • [](https://www.databricks.com/feed)
  • [](https://www.glassdoor.com/Overview/Working-at-Databricks-EI_IE954734.11,21.htm)
  • [](https://www.youtube.com/@Databricks)

© Databricks 2026. All rights reserved. Apache, Apache Spark, Spark, the Spark Logo, Apache Iceberg, Iceberg, and the Apache Iceberg logo are trademarks of the Apache Software Foundation.

We Care About Your Privacy

Databricks uses cookies and similar technologies to enhance site navigation, analyze site usage, personalize content and ads, and as further described in our Cookie Notice. To disable non-essential cookies, click “Reject All”. You can also manage your cookie settings by clicking “Manage Preferences.”

Manage Preferences

Reject All Accept All

Image 13: Databricks Company Logo

Privacy Preference Center

Opt-Out Preference Signal Honored

Privacy Preference Center

  • ### Your Privacy
  • ### Strictly Necessary Cookies
  • ### Performance Cookies
  • ### Functional Cookies
  • ### Targeting Cookies
  • ### TOTHR

#### Your Privacy

When you visit any website, it may store or retrieve information on your browser, mostly in the form of cookies. This information might be about you, your preferences or your device and is mostly used to make the site work as you expect it to. The information does not usually directly identify you, but it can give you a more personalized web experience. Because we respect your right to privacy, you can choose not to allow some types of cookies. Click on the different category headings to find out more and change our default settings. However, blocking some types of cookies may impact your experience of the site and the services we are able to offer.

#### Opting out of sales, sharing, and targeted advertising

Depending on your location, you may have the right to opt out of the “sale” or “sharing” of your personal information or the processing of your personal information for purposes of online “targeted advertising.” You can opt out based on cookies and similar identifiers by disabling optional cookies here. To opt out based on other identifiers (such as your email address), submit a request in our Privacy Request Center.

More information

#### Strictly Necessary Cookies

Always Active

These cookies are necessary for the website to function and cannot be switched off in our systems. They assist with essential site functionality such as setting your privacy preferences, logging in or filling in forms. You can set your browser to block or alert you about these cookies, but some parts of the site will no longer work.

#### Performance Cookies

  • [x] Performance Cookies

These cookies allow us to count visits and traffic sources so we can measure and improve the performance of our site. They help us to know which pages are the most and least popular and see how visitors move around the site.

#### Functional Cookies

  • [x] Functional Cookies

These cookies enable the website to provide enhanced functionality and personalization. They may be set by us or by third party providers whose services we have added to our pages. If you do not allow these cookies then some or all of these services may not function properly.

#### Targeting Cookies

  • [x] Targeting Cookies

These cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant advertisements on other sites. If you do not allow these cookies, you will experience less targeted advertising.

#### TOTHR

  • [x] TOTHR

Cookie List

Consent Leg.Interest

  • [x] checkbox label label
  • [x] checkbox label label
  • [x] checkbox label label

Clear

  • - [x] checkbox label label

Apply Cancel

Confirm My Choices

Allow All

Image 14: Powered by Onetrust
Image 16

Image 17Image 18

Image 19

AI may generate inaccurate information. Please verify important content.