AI Success Starts with Clean Data, Not Just Better Models
TL;DR · AI Summary
Databricks emphasizes that AI success depends on high-quality data rather than just model improvements, arguing that data cleaning, governance, and unified platforms are foundational to reliable AI — not merely chasing larger models.
Key Takeaways
- The best model cannot compensate for bad data — clean, consistent data is the re
- Data governance (e.g., Unity Catalog) and unified platforms (Lakehouse) deliver
- Enterprises should prioritize investment in data engineering and reliability ove
Outline
Jump quickly between sections.
指出当前行业过度关注模型规模,而忽视数据清洗与治理对AI效果的决定性影响。
介绍Databricks Unity Catalog如何实现跨数据资产的统一治理与元数据管理。
说明Lakehouse如何整合数据湖与数据仓库优势,为AI提供可信、可追溯的数据基础。
呼吁企业将资源从模型参数竞赛转向数据工程、数据质量监控与管道自动化。
Mindmap
See how the topics connect at a glance.
查看大纲文本(无障碍 / 无 JS 友好)
- AI成功依赖数据质量而非模型大小
- 数据问题根源
- 数据噪声与偏差
- 缺乏元数据治理
- 解决方案
- Unity Catalog 统一治理
- Lakehouse 架构
- 战略转向
- 投资数据工程
- 减少模型竞赛
Highlights
Key sentences worth saving and sharing.
The best model in the world cannot compensate for bad data — clean, well-governed data is the real differentiator.
We’ve seen companies spend millions on LLMs while their training data contained duplicates, biases, and missing timestamps.
Investing in data reliability pays off 10x faster than chasing the next model release.
AI success starts with clean data, not just better models | Databricks Blog
[](http://www.databricks.com/)
[](http://www.databricks.com/)
- Why Databricks
- * Discover
- Customers
- Partners
- Product
- * Databricks Platform
- Integrations and Data
- Pricing
- Open Source
- Solutions
- * Databricks for Industries
- Cross Industry Solutions
- Migration & Deployment
- Solution Accelerators
- Resources
- * Learning
- Events
- Blog and Podcasts
- Get Help
- Dive Deep
- About
- * Company
- Careers
- Press
- Security and Trust
- DATA + AI SUMMIT 
Table of contents
Table of contents
Table of contents
EnergyMay 5, 2026
AI success starts with clean data, not just better models
Why building an AI-ready foundation doesn't stop at the technology
by Aly McGue
Summary
- Platforms like Kraken and Databricks solve the foundational data challenge, giving organizations unified, well-documented data that makes everything from self-service analytics to AI viable.
- Once you solve the unified data challenge, the rest is a business problem.
- Data is a business asset, not an IT platform. The organizations pulling ahead pair unified data with deep business context and a data-literate culture.
Kraken, the AI-powered operating system behind some of the world's largest utilities, manages over 90 million customer accounts across 27 countries for clients including EDF, E.ON, National Grid, and Tokyo Gas. Kraken uses Databricksas its internal data platform and partners with Databricks to help clients maximize the value of the data they receive via secure, scalable data distribution.
Kristy Mayer-Mejia is the Global Head of Data Transformation at Kraken, where her team helps utility clients understand, adopt, and extract value from the data Kraken provides. Her mandate is twofold: speed up the time it takes clients to use the data, and increase the value they get from it.
I sat down with Kristy to understand how data functions as a business asset and is the foundation of a successful AI strategy. A key point from our conversation is that becoming data-driven is as much about clean, unified data as it is about deep business context and ownership. Platforms like Kraken and Databricks solve what Kristy calls the foundational unification problem, the prerequisite that makes everything else viable. But once that foundation is in place, the part most leaders underestimate is the business context that makes unified data usable.
Why data unification is table stakes
Aly McGue:In your experience, why do siloed data and fragmented systems remain a big hurdle for organizations trying to extract value from their investments?
Kristy Mayer-Mejia: What we see repeatedly with our clients is that low-quality, siloed data is the single biggest blocker to getting value from any other investment. Until the data is in one place, nothing else works at scale, and solving that is exactly the problem Kraken’s platform is designed to address. And I've lived this as a data leader in all my prior roles, too. Your team spends 80% of their time cleaning data, and that's just not valuable work. It's not necessary.
The real unlock is self-service, but it’s only possible once the underlying data is clean, unified, and accessible. Especially in the age of AI, self-service is possible at scale. You're never going to move quickly as a business, innovate or make day-to-day data-driven decisions if every question has to be answered by the data team. But when the data is scattered across systems with no documentation and no clear way to join it, self-service is impossible. Unification is this foundational unlock that makes everything else viable: the analytics, the AI, the speed of decision-making. It's table stakes.
The number no one trusts
Aly:We’ve all been in meetings where leadership spends more time debating 'which number is right’ than actually making a decision. What is the hidden cost of that lack of trust in the data?
Kristy: I give this example all the time, and it's been true at every company I've ever worked at. Before you have unified data, the classic question is: how many customers do we have? And no one totally knows. You know the rough magnitude. But when I give that example, every time people laugh because they know it's true.
What it leads to is a lack of trust in the data. And one of the primary early values that unified data provides is the speed of decision-making, the ability to embed data-driven thinking in the company's DNA. You can't move quickly if every time you pull a number, you're thinking, ’ Am I sure this is right? ’ Let me check five other places. Let me ask someone. And then it's different. And then you have to run down why it's different. Suddenly, it's two weeks later or a month later, and you might as well have just picked a random direction and kept moving.
AI is the forcing function enterprise analytics needed
Aly:We often talk about data fueling AI, but you’ve suggested that AI might actually be a 'forcing function’ for better data. How is the push for AI changing the way organizations approach documentation and context?
Kristy: AI has actually been a forcing function. The inputs AI needs are the same inputs humans need: clear data, documentation, context on what columns mean, and how things join together. When data is hard to use, self-service analytics feels like a nice-to-have because the value is hard to pin down. It's a few hours saved here and there on individual decisions, which doesn't feel compelling in isolation. But accumulated across the organization, it's huge. It's just hard to see.
AI has made that value visible and has made clean data & documentation table stakes. It takes what everyone always knew was needed and makes it non-negotiable. And then on the other side, AI itself provides the tools to unlock analytics. Things like conversational interfaces that let people query data without writing SQL. So it's both the forcing function that drives unification and the payoff that comes out of it.
Metadata as the missing ingredient
Aly:You've talked about the need to unify and document data. But when it comes to AI specifically, is documentation in a knowledge base or a PDF enough?
Kristy: It used to be. We shared our data documentation the way most companies do: a PDF, or a page on a website that a data analyst could reference when they needed context. That works well enough for humans. It does not work for AI.
Every client I talk to now is asking the same question: can you share the metadata in context, alongside the data itself, so we can actually feed it into models and have them understand what they're working with? That shift, from documentation as a reference artifact to documentation as a live input, is one of the more underappreciated changes AI is forcing. With Unity Catalog and Delta Sharing, we can share that context with the data rather than separately from it. For our clients, that is often the difference between AI that can reason about the data and AI that cannot.
From monthly reports to hourly decisions
Aly:What does 'data unification’ look like in practice? How does near-real-time visibility change day-to-day operations?
Kristy: A few examples from our clients stand out. One is call center operations, which is a massive function for utilities. We had a client go from monthly reporting on call volume, which was so painful to put together, to dashboards that update every couple of hours, with a predictive model layered on top of what calls they're likely to see going forward. That ability to fine-tune operations in near real time, rather than looking backward once a month, is a completely different way of running the business.
Another area is product innovation. In the utility space, clients are determining which products and tariffs to offer to attract and retain customers. That's a decision that can be deeply optimized with data. Clean, clear data give clients easy insight, and rapid test-and-learn cycles to optimize their product offers – and then Kraken's platform lets them quickly launch those new tariffs.
Getting people into the data
Aly: The 'analyst bottleneck’ is a classic pain point for leadership. How do natural language interfaces, like Databricks Genie, shift the culture from waiting weeks for a report to getting answers in minutes?
Kristy: Most of our Genie clients are still in the early stages. But what we're seeing is that it's accelerating their time to get started by weeks or more. They don't need to deeply model the data the way you would to feed it into a traditional BI tool. They need clear documentation, they need the context, they need the data in one place, but they don't have to structure it so precisely that a user can explore it through a rigid interface.
But beyond the speed, there's a really clear cultural knock-on effect. One of the bigger barriers to data value is the cultural shift of making data part of your DNA. And I firmly believe one of the keys to that is making it incredibly easy and intuitive. When the barrier is low, and people can get in quickly, the culture, and the compounding value, follows.
The advice most C-suite get wrong
Aly:What is the biggest misconception C-level leaders have when they task their IT departments with 'getting the data ready’ for AI?
Kristy: Data is a business asset. And the biggest mistake I see leaders make is treating it like an IT platform. They disconnect it from the business and say, "Okay, IT, go prepare our data.” But the key to building a solid data foundation is the deep business context. How is the data generated? How is it used? How do people interpret it? What does this field actually mean? Once the technical foundation is in place, the hardest part becomes that deep business context. And the vast majority of that work sits with the business, not the data team.
So my advice is to embed data within the business. The roadmap to getting your data ready for AI is a shared roadmap. It's a business roadmap as much as it is a technical one.
What good looks like from here
Aly: Kraken sits across a large share of the utility industry's data. Where do you see AI and data taking your clients over the next three to five years?
Kristy: What I find most interesting is how quickly AI is raising the ceiling on what clients can do once they have a solid data foundation. For a long time, the question was: _how do we get our data into a usable state?_ That work is still real, and it still takes time. But the question is shifting toward: _now that the foundation is there, what becomes possible?_ And the answer to that keeps expanding. AI is changing where clients start from and what good looks like. Clients who would have considered a monthly report a success two years ago are now running hourly dashboards with predictive models layered on top and looking quickly toward broad use of agentic AI.
The ones who invested early in their data capabilities – and not just their tech but their skills and culture – are the ones moving fastest now, and the gap between them and everyone else is only going to widen.
Closing Thoughts
Kristy's perspective adds an often-missing layer to the data infrastructure conversation. The platform and the unification it enables are the foundational unlock. But where she sees most organizations stall is in the work that comes after: the business knowledge that makes data usable, the documentation that makes AI possible, and the cultural shift that makes self-service real.
As you develop your roadmap to embed AI across your organization and products, download the Databricks State of AI Agents to help you benchmark your investments.
Get the latest posts in your inbox
Subscribe to our blog and get the latest posts delivered to your inbox.
Sign up
*
Work Email
*
Country Country*
By clicking “Subscribe” I understand that I will receive Databricks communications, and I agree to Databricks processing my personal data in accordance with its Privacy Policy.
Subscribe

Why Databricks
Discover
Customers
Partners
Why Databricks
Discover
Customers
Partners
Product
Databricks Platform
- Platform Overview
- Sharing
- Governance
- Artificial Intelligence
- Business Intelligence
- Database
- Data Management
- Data Warehousing
- Data Engineering
- Business Productivity
- Application Development
- Security
Pricing
Integrations and Data
Product
Databricks Platform
- Platform Overview
- Sharing
- Governance
- Artificial Intelligence
- Business Intelligence
- Database
- Data Management
- Data Warehousing
- Data Engineering
- Business Productivity
- Application Development
- Security
Pricing
Open Source
Integrations and Data
Solutions
Databricks For Industries
- Communications
- Financial Services
- Healthcare and Life Sciences
- Manufacturing
- Media and Entertainment
- Public Sector
- Retail
- View All
Cross Industry Solutions
Solutions
Databricks For Industries
- Communications
- Financial Services
- Healthcare and Life Sciences
- Manufacturing
- Media and Entertainment
- Public Sector
- Retail
- View All
Cross Industry Solutions
Data Migration
Professional Services
Solution Accelerators
Resources
Learning
Events
Blog and Podcasts
Resources
Documentation
Customer Support
Community
Learning
Events
Blog and Podcasts
About
Company
Careers
Press
About
Company
Careers
Press
Security and Trust

Databricks Inc.
160 Spear Street, 15th Floor
San Francisco, CA 94105
1-866-330-0121
- [](https://www.linkedin.com/company/databricks)
- [](https://www.facebook.com/pages/Databricks/560203607379694)
- [](https://twitter.com/databricks)
- [](https://www.databricks.com/feed)
- [](https://www.glassdoor.com/Overview/Working-at-Databricks-EI_IE954734.11,21.htm)
- [](https://www.youtube.com/@Databricks)

- [](https://www.linkedin.com/company/databricks)
- [](https://www.facebook.com/pages/Databricks/560203607379694)
- [](https://twitter.com/databricks)
- [](https://www.databricks.com/feed)
- [](https://www.glassdoor.com/Overview/Working-at-Databricks-EI_IE954734.11,21.htm)
- [](https://www.youtube.com/@Databricks)
© Databricks 2026. All rights reserved. Apache, Apache Spark, Spark, the Spark Logo, Apache Iceberg, Iceberg, and the Apache Iceberg logo are trademarks of the Apache Software Foundation.
We Care About Your Privacy
Databricks uses cookies and similar technologies to enhance site navigation, analyze site usage, personalize content and ads, and as further described in our Cookie Notice. To disable non-essential cookies, click “Reject All”. You can also manage your cookie settings by clicking “Manage Preferences.”
Manage Preferences
Reject All Accept All

Privacy Preference Center
Opt-Out Preference Signal Honored
Privacy Preference Center
- ### Your Privacy
- ### Strictly Necessary Cookies
- ### Performance Cookies
- ### Functional Cookies
- ### Targeting Cookies
- ### TOTHR
#### Your Privacy
When you visit any website, it may store or retrieve information on your browser, mostly in the form of cookies. This information might be about you, your preferences or your device and is mostly used to make the site work as you expect it to. The information does not usually directly identify you, but it can give you a more personalized web experience. Because we respect your right to privacy, you can choose not to allow some types of cookies. Click on the different category headings to find out more and change our default settings. However, blocking some types of cookies may impact your experience of the site and the services we are able to offer.
#### Opting out of sales, sharing, and targeted advertising
Depending on your location, you may have the right to opt out of the “sale” or “sharing” of your personal information or the processing of your personal information for purposes of online “targeted advertising.” You can opt out based on cookies and similar identifiers by disabling optional cookies here. To opt out based on other identifiers (such as your email address), submit a request in our Privacy Request Center.
#### Strictly Necessary Cookies
Always Active
These cookies are necessary for the website to function and cannot be switched off in our systems. They assist with essential site functionality such as setting your privacy preferences, logging in or filling in forms. You can set your browser to block or alert you about these cookies, but some parts of the site will no longer work.
#### Performance Cookies
- [x] Performance Cookies
These cookies allow us to count visits and traffic sources so we can measure and improve the performance of our site. They help us to know which pages are the most and least popular and see how visitors move around the site.
#### Functional Cookies
- [x] Functional Cookies
These cookies enable the website to provide enhanced functionality and personalization. They may be set by us or by third party providers whose services we have added to our pages. If you do not allow these cookies then some or all of these services may not function properly.
#### Targeting Cookies
- [x] Targeting Cookies
These cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant advertisements on other sites. If you do not allow these cookies, you will experience less targeted advertising.
#### TOTHR
- [x] TOTHR
Cookie List
Consent Leg.Interest
- [x] checkbox label label
- [x] checkbox label label
- [x] checkbox label label
Clear
- - [x] checkbox label label
Apply Cancel
Confirm My Choices
Allow All