T
traeai
Sign in
返回首页
量子位

Future Inference Will Consume 70% of Compute Power, 30% Left for Training | Silicon Valley Investor Zhang Lu @ AIGC2026

8.7Score
Future Inference Will Consume 70% of Compute Power, 30% Left for Training | Silicon Valley Investor Zhang Lu @ AIGC2026

TL;DR · AI Summary

Future AI compute power will be dominated by inference at 70%, with communication energy consumption being the main bottleneck in data centers, and physical AI development depends on high-quality real-world data.

Key Takeaways

  • Inference compute demand will account for 70% of total compute, surpassing train
  • Data center communication energy consumption may be hundreds of times higher tha
  • The bottleneck for physical AI is lack of high-quality real-world data

Outline

Jump quickly between sections.

  1. Inference compute demand will rise from current 50% to 70%, becoming the core optimization target for AI infrastructure.

  2. Communication in data centers may consume hundreds of times more energy than computation, making optical communication technologies critical.

  3. Physical AI faces a bottleneck due to mature architecture and compute but lacks sufficient high-quality real-world data.

  4. Industries like healthcare are prioritized for AI investment due to high-quality data density, not just market size.

  5. Corporate AI budgets have surged from millions to billions, with procurement cycles compressed to one or two months.

Mindmap

See how the topics connect at a glance.

查看大纲文本(无障碍 / 无 JS 友好)
  • AI算力未来趋势
    • 推理主导
      • 占比70%
      • 可持续需求
    • 通信瓶颈
      • 能耗百倍于计算
      • 光学通信突破
    • 物理AI
      • 数据层瓶颈
      • 真实数据稀缺

Highlights

Key sentences worth saving and sharing.

  • Inference compute demand will account for 70% of total compute, surpassing training as the core optimization focus.

    Paragraph 3

    ⬇︎ 下载 PNG𝕏 分享到 X
  • Communication in data centers may consume hundreds of times more energy than computation.

    Paragraph 5

    ⬇︎ 下载 PNG𝕏 分享到 X
  • Physical AI now faces a bottleneck: while architecture and compute are ready, the real issue is insufficient high-quality real-world data.

    Paragraph 4

    ⬇︎ 下载 PNG𝕏 分享到 X
#AI Compute#Data Center#Physical AI#Data Quality#Communication Technology
Open original article

< img id="wx_img" src="https://www.qbitai.com/wp-content/uploads/imgs/qbitai-logo-1.png" width="400" height="400">

2026-05-24 14:14:05 Source: QbitAI

Technological innovation is just the starting point; the speed of industry integration is the true competitiveness of AI implementation

Compiled by the Editorial Team from AIGC2026

QbitAI | Official Account

From the perspective of a Silicon Valley investor, the narrative around AI is quietly shifting.

At this critical juncture where old and new cycles are transitioning, Zhang Lu, Founding Partner at Fusion Fund, shared her frontline insights:

Over the past two years, all eyes have been focused on models and compute power. However, the real battlefield is now shifting to the "communication layer" of infrastructure and the "data layer" of the physical world.

At the 2026 China AIGC Industry Summit, she articulated the shift in this round of AI narratives very directly—

Inference will surpass training and become the new protagonist of compute demand; the often-overlooked communication segment in data centers may consume hundreds of times more energy than computation itself.

As for the next truly promising direction, in her view, it's not about larger models, but rather more realistic and high-quality data, as well as three AI application areas: healthcare, space, and nanorobots.

Image 1

To fully capture Zhang Lu’s thinking, QuantumBit has translated and edited her speech content without altering its original meaning, hoping to offer you more inspiration.

The 2026 China AIGC Industry Summit was hosted by QuantumBit, with nearly 20 industry representatives attending to discuss. Over a thousand people attended in person, and nearly 4 million watched live online, receiving widespread attention and coverage from mainstream media.

Core Points Summary

  • The focus of compute demand is shifting from training to inference: Training is a one-time compute investment, while inference represents sustainable long-term demand. As intelligent agent interaction replaces chat interaction, the proportion of inference compute will continue to rise from the current 50%, becoming the core optimization direction of AI infrastructure;
  • The real energy hog in data centers is communication: In AI data centers, the energy consumed by communication could be hundreds of times higher than that used for computation, indicating that next-generation communication technologies like optical communication hold far greater value than commonly perceived;
  • Physical AI is currently stuck at the data layer: Architecture and compute power are already available, but the real bottleneck lies in the lack of sufficient high-quality real-world data; synthetic data can serve as a supplement, but cannot replace real-world data collection in edge scenarios;
  • Data quality matters more than quantity, and healthcare happens to be one of the industries with the highest density of high-quality data: This is the underlying logic behind why many AI tech companies concentrated on the healthcare sector in 2025—not merely because the market is large;
  • Technological innovation is just the starting point; the speed of industrial integration is the true competitiveness of AI deployment: When Fortune 500 companies’ AI budgets leap from millions to billions, and procurement cycles compress from six months to one or two months, this acceleration itself fuels continuous model and application iterations.

Below is the original transcript of Zhang Lu's speech:

The New Turning Point in AI Narratives

Hello everyone, I'm Zhang Lu, founding and managing partner of Fusion Fund in Silicon Valley.

Over the past ten to eleven years, we've focused on investing in early-stage technology companies in North America, particularly in three fields—enterprise-level artificial intelligence, medical AI, and industrial automation.

In the past two years, many of you might have noticed that Silicon Valley has experienced a rapid cycle of innovation, especially in fast-paced industrial innovation and AI integration.

So over the past two years, we’ve worked hard but also excitedly witnessed many outstanding entrepreneurs and startups growing rapidly, and seen multiple innovations in AI, infrastructure, and AI application layers.

This year, I believe we’ve entered a new era, where there are some new changes in the overall narrative and focus of AI innovation.

Today, I’m very happy to share with you the shifts in emerging AI innovation trends in Silicon Valley over the past year and some latest developments.

In recent years, when discussing AI innovation, several keywords kept recurring—such as large language models, generative AI, training, and compute demands.

But recently, we’ve observed new shifts in context, such as moving beyond discussions of large language models to now focusing more on industry-specific applications, based on how small language models can be deployed efficiently and cost-effectively into vertical AI sectors. Another shift is in the object of discussion.

From language models, we're now talking more about physical AI and adjustments to world models.

Meanwhile, in terms of computing, we often talk about the massive compute needs of AI. Previously, most compute consumption happened during training.

But lately, people are increasingly discussing how inference will require even more compute power, potentially even exceeding training and becoming long-term sustainable compute demand.

At a fundamental level, we’re seeing more discussions around data—from initially emphasizing scaling laws (the idea that more data leads to better AI models), to now focusing more on data quality—how to obtain high-quality industry data? How to build better databases using high-quality industry data? We call this data curation and data libraries.

Based on these data qualities, we optimize AI, whether in model capabilities or application performance.

Redefining AI Infrastructure

Today, I'd like to quickly share a few AI areas we're very bullish on and which are rapidly developing.

First, let's talk about AI infrastructure.

If you've followed NVIDIA's GTC conference in March, you'll see that NVIDIA's narrative has changed. It used to be a GPU chip company, but now CEO Jensen Huang has clearly stated—

NVIDIA is an AI infrastructure company, an AI factory.

From a token economics perspective, future demand for AI infrastructure may be as common as electricity demand—a massive foundational industrial revolution.

Thus, we see high demand for AI infrastructure innovation. AI has now entered the deployment phase, and large-scale deployments require strong AI infrastructure support.

There are now many newly built AI data centers facing challenges such as power consumption, energy usage in communication layers, and various technical issues.

Therefore, how to innovate at this level brings many opportunities, one of which we often discuss is compute optimization within infrastructure.

I mentioned earlier that the core change in compute is—previously, compute focused more on training itself. But now we see a clear shift: training is a one-time compute investment, whereas inference is a sustainable compute need.

A few years ago, compute demand for training accounted for over 70%, while inference only had 20% to 30%. Now, inference has reached half, and in the future, it may become 30:70 (training:inference).

Especially now, we're entering a new stage of transition—from chat-based interactions to intelligent agent interactions. If you have an intelligent agent, do you want it to stay online and respond continuously?

This process increases the sustainability and volume of inference demand, making inference compute consumption more central.

Therefore, how to optimize inference? How to optimize inference compute? Is one of the core issues AI infrastructure must solve in the future.

We’ve mainly focused on compute in AI infrastructure. In discussions, people often talk about how much energy is consumed during computation.

Now, the whole world is discussing that one of the key bottlenecks in AI development is energy consumption. But after computation, what comes next? Communication.

During this communication process, there are demands for communication capabilities, internal communications, and switch requirements. In AI data centers, the total energy consumption of the communication layer may be tens to hundreds of times higher than that of computation.

Last year, I had the privilege of having a conversation with our former president and also chairman of Alphabet (Google’s parent company), John Hennessy, at Stanford. In that conversation, he specifically pointed out—

The energy consumed during communication could be over a hundred times higher than computation itself.

Image 2

The design philosophy of CPUs and GPUs tries to perform as much computation as possible locally on the chip, rather than transferring data more frequently. You can deploy computations everywhere or move data around, but moving data consumes more energy than moving computation itself.

This has spurred many new technologies focusing precisely on communication—how to develop next-generation communication technologies? We often mention innovations in optical communication, aiming to significantly reduce energy consumption in communication processes, which is very critical.

We also mentioned a shift—from language models → world models → physical AI.

With physical AI, the data we use isn’t just text—it includes a lot of 3D and real-world data, and the scale of this data is enormous.

In such cases, transferring data everywhere would consume even more energy.

That’s why we’re seeing more innovations in this field.

Breaking Through Physical AI: Edge Computing and New Sensors

Earlier, I mentioned “physical AI” many times.

Physical AI is also an emerging direction, not just about humanoid robots.

Physical AI involves simulation, data layers, world models, and other fields, covering interactions between the physical world and AI. Whether it's autonomous driving, high-precision production in factories, or applications in healthcare, logistics supply chains, and aerospace—all are widely adopting physical AI.

How can we better interact between the physical world and AI? This is a key area of innovation now.

We see many innovations focusing on simulation and data layers.

For physical AI today, the biggest bottleneck is that although our architecture and compute power are ready, what we lack most is data—and the main bottleneck is indeed in data.

We don’t have enough high-quality real-world data to support physical AI model training. Of course, there are many discussions about synthetic data as a related direction.

Synthetic data is also a rapidly developing area, including using synthetic data to support simulations. But during this process, we find that synthetic data has many drawbacks or blind spots.

Therefore, collecting edge data from real-world scenarios remains crucial.

This means we shouldn’t only focus on technical development at the model layer, but also invest more effort into new data collection platforms and data optimization platform innovations, so we can build better databases to support further development of physical AI.

Since the core pain point of physical AI is its data layer, how can we obtain high-quality data from the real world and industry?

You’ll find that traditional manufacturing industries generate a lot of high-quality 3D real-world data, but their bottleneck is lacking a good data collection platform for standardized data collection, optimization, and curation, allowing data to reach stages suitable for AI model training.

In this process, due to real-world application needs, there are natural constraints on the consumption of such data collection platforms.

Thus, how to better deploy AI at the edge is also a very important direction.

Here, I introduce a new technology—artificial skin, which is based on flexible electronics.

Image 3

△ From Stanford University official website

This year, many such companies have emerged. One of the best research efforts is from Professor Zhenan Bao’s lab at Stanford University. Their artificial skin sensor is a high-precision, low-power sensor that can be as thin as a glove. Whether worn on a robotic hand or human hand, it offers highly accurate tactile sensing points. This tactile data becomes a valuable data source supporting the physical world.

Image 4

△ Quoted from Chinese Academy of Sciences

In this process, we don’t only see startups building new data collection platforms.

When we talk with some Fortune 500 companies, especially leading manufacturers, I found they are also exploring similar technologies internally. So everyone realizes that the core bottleneck is in the data layer, and more innovations are focusing on this layer.

I’d like to emphasize one more point—edge computing.

Edge computing will also develop very rapidly. For us, this is not a new direction—we started investing in edge computing back in 2018 and 2019.

In the past two years, the industry has formed a consensus—the future direction of AI development is AI deployment at the edge.

How to achieve edge AI deployment? It goes back to the issue we discussed earlier—it requires a small model.

For example, one company we invested in this year was recently acquired by Qualcomm. Their model is so small it can be under 1 billion tokens. With such a model, you can run it on a Raspberry Pi (a single-board computer) and still get AI capabilities comparable to GPT-4.

Including recent open-source models released by Google, some are also very small edge models. Therefore, deploying AI at the edge is extremely important.

Integrating edge AI deployment with new data collection platforms allows us to collect data, process it locally, and apply AI locally at the edge. This is a great direction for highly regulated industries and those sensitive to data privacy.

Three Application Areas to Watch: Healthcare, Space, and Nanorobots Smaller Than Cells

Finally, let me share a few specific AI application directions I’m very bullish on.

This year is a very important year for AI in healthcare in Silicon Valley. At the beginning of the year, Eli Lilly (a U.S. pharmaceutical company) and NVIDIA announced a $1 billion collaboration.

...

Image 5

△ From NVIDIA's official website

Their collaboration is not just about the combination of artificial intelligence and healthcare. They also hope to build an ecosystem of AI + healthcare + data technology, helping more startups form strategic partnerships with them. We have several companies currently working with them.

In January, you may have noticed that both ChatGPT and Claude released dedicated products for healthcare applications. In particular, Claude for Health focuses on underlying infrastructure, providing support in areas such as data, privacy, security, and compliance for healthcare service providers and hospitals to better integrate AI into medical practices.

A few weeks ago, Merck (one of the world’s largest pharmaceutical companies based in the U.S.) announced a major strategic partnership with Google's Gemini. So we can see that many AI tech companies are entering the healthcare space.

Healthcare is not only one of the largest markets in the U.S. (around 20% of GDP is spent on healthcare), but more importantly, there has been a significant shift or consensus — people now realize that data quality matters more than data volume.

Which industry has massive amounts of high-quality data? One of the most important industries is healthcare.

We started publishing reports on AI in healthcare back in 2017, and last year we released a new updated version. You can see how much evolution has taken place.

Today, we’re seeing many new AI healthcare companies focusing on vertical small models.

For example, some focus specifically on building vertical AI models for cell therapy, others for MRA (magnetic resonance angiography) sequencing data, and even some targeting specific diseases like Parkinson’s or Alzheimer’s, combining various types of data and bioinformatics for personalized diagnosis and treatment.

During this process, it’s not just AI models — robots and physical AI are also being widely deployed in the medical field.

Here I want to mention a company called Medra, which we invested in last year.

They are a team from Stanford who have developed a complete physical AI system. At the AI level, they can understand how to design biological and medical experiments, while also controlling robotic arms and automated robots for experimental processes, ultimately automating entire life science research and medical research workflows.

Image 6

A few weeks ago, they opened their global largest fully autonomous physical AI robot laboratory in San Francisco, and it is now running non-stop, setting up various kinds of experiments day and night.

This company collaborated early on with many pharmaceutical firms, so when we talk about AI in healthcare today, it's no longer just about basic clinical consultations or doctor-assisted functions — it has evolved into very core areas like personalized treatments.

Personalized treatment isn’t limited to cancers or cardiovascular diseases anymore; now especially neurological conditions like Parkinson’s, Alzheimer’s, and depression are being deeply integrated with AI, and even physical AI.

Another direction I’m particularly excited about is the combination of physical AI and space technology. Especially over the next 3–5 years, the development of the space sector will accelerate rapidly — including the rise of the space economy, space ecosystems, and space infrastructure.

Everyone is paying attention to the upcoming SpaceX IPO, which is a strong signal showing that the space economy will grow quickly over the next few years. Due to the unique nature of the space ecosystem, it naturally possesses AI-native and robotics-native characteristics.

For instance, when talking about space infrastructure deployment, physical AI and robotic innovations play a key role. Another major future direction is space factories.

You could send humans to space factories, but in the short term, sending robots might be a better option. For example, during current lunar exploration missions, before any human landing attempts, many robots and mechanical devices are already deployed.

We’ve also invested in companies focused on space infrastructure, especially in robotic applications aimed at improving efficiency.

Take space refueling stations as an example — all equipment is ready, and within the past year alone, they've already secured over $100 million in orders. This fast-paced innovation cycle makes this industry a very promising and rapidly evolving area for AI advancements.

The final direction I’d like to highlight is also related to healthcare, but requires time to mature — yet it’s one that will excite everyone:

Looking at robot development from a smaller scale — we call these microrobots or nanorobots.

Smaller robots can enter human blood vessels to clear clots, or even shrink down to DNA levels to deliver targeted drugs, achieving immune stealth inside the body.

More and more technologies are emerging in this direction. For example, microrobots designed for clot removal are entering the initial stages of commercial application. Also, developments in DNA engines, Nanoswimmers (a subtype of nanorobots that swim), and targeted drug delivery nanorobots are expected to show great potential and rapid growth in the coming years.

So today, I wanted to quickly share some exciting directions in AI that we’ve observed over the past year or so.

The entire ecosystem is currently in a phase of iteration, with many new model architectures emerging.

Of course, during this process, the AI ecosystem faces many challenges. But for entrepreneurs, challenges mean opportunities, so we're seeing more outstanding founders exploring new directions in industries.

As investors, we are truly fortunate. I believe this is a happy time for early-stage AI investors — we can witness not only technological innovation but also an era of rapid industrial iteration.

Finally, what I'd like to share is that beyond technical innovation, we are now at a critical point where the industry's attitude toward technology integration is changing rapidly.

We’re seeing large enterprises increase their AI budgets from millions to hundreds of millions, even billions. Where once sales cycles took months or even longer, now they take just one or two months. This kind of rapid industrial integration is the core competitiveness that allows AI technology to develop quickly.

Because only by reaching real-world application scenarios can we rapidly gain users or receive feedback from use cases, and obtain high-quality industrial data to continuously iterate our model architectures and applications.

So I look forward to more innovations appearing in the coming year, and I encourage everyone to visit Silicon Valley more often for deeper technical exchanges. Thank you.

_© All rights reserved. No part of this content may be reproduced or used in any form without permission. Violators will be prosecuted._

AI may generate inaccurate information. Please verify important content.