T
traeai
Sign in
返回首页
量子位

Just Now, a Chinese AI Built an AI for the First Time Globally!

9.0Score
Just Now, a Chinese AI Built an AI for the First Time Globally!

TL;DR · AI Summary

Chinese AI company Mengzi Intelligent successfully independently developed the world's first fully AI-written production-level large model pre-training framework ForgeTrain, which is faster by 10% compared to NVIDIA Megatron and performs excellently in multiple evaluations.

Key Takeaways

  • Mengzi Intelligent independently developed the world's first fully AI-written pr
  • ForgeTrain outperforms existing large model pre-training frameworks in training
  • MiniCPM5-1B demonstrates excellent performance in multiple evaluations, breaking

Outline

Jump quickly between sections.

  1. Introducing Mengzi Intelligent's independently developed global first fully AI-written production-level large model pre-training framework ForgeTrain.

  2. ForgeTrain surpasses existing large model pre-training frameworks in training speed and performance.

  3. MiniCPM5-1B can run on desktops and has customizable personalities.

  4. ForgeTrain provides an opportunity for domestic large models and chips to achieve overtake through a roundabout approach.

Mindmap

See how the topics connect at a glance.

查看大纲文本(无障碍 / 无 JS 友好)
  • AI制造AI
    • 面壁智能
      • ForgeTrain
      • MiniCPM5-1B
    • 发展优势
      • 训练速度快
      • 性能优越
    • 应用场景
      • 桌面端运行
      • 自定义人格
    • 行业变革
      • 从堆资源走向极致提效率
    • 国产机会
      • 弯道超车

Highlights

Key sentences worth saving and sharing.

  • Mengzi Intelligent successfully independently developed the world's first fully AI-written production-level large model pre-training framework ForgeTrain, which is faster by 10% compared to NVIDIA Meg

    Second paragraph

    ⬇︎ 下载 PNG𝕏 分享到 X
  • MiniCPM5-1B demonstrates excellent performance in multiple evaluations, breaking new ground in small model intelligence density.

    Fourth paragraph

    ⬇︎ 下载 PNG𝕏 分享到 X
  • AI manufacturing AI is transitioning from resource stacking to extreme efficiency improvement, driving changes in the large model industry.

    Sixth paragraph

    ⬇︎ 下载 PNG𝕏 分享到 X
#large models#AI#Mengzi Intelligent#ForgeTrain#MiniCPM5-1B
Open original article

< img id="wx_img" src="https://www.qbitai.com/wp-content/uploads/imgs/qbitai-logo-1.png" width="400" height="400">

2026-05-26 16:46:15 Source: Quantum Bit

Training speed is 10% faster than NVIDIA Megatron

Jin Lei from凹非寺

Quantum Bit | Official Account QbitAI

Building AI has become the main character of this story.

Because just now, a domestic AI wrote its own large model pre-training framework and then used that framework to train a new small-sized model!

Image 1

This is the big news brought out by Wallace Intelligence.

The pre-training framework written by AI is called ForgeTrain, which is the first global production-level large model pre-training framework entirely written by AI, with performance even outperforming NVIDIA's Megatron.

And ForgeTrain pre-trained MiniCPM5-1B on Huawei Ascend, achieving a 10% acceleration compared to the Ascend framework.

Surrounding it, Wallace Intelligence proposed a new software programming paradigm called Forge Engineering.

In simpler terms, as the cost of AI writing code continues to decrease, future software may not necessarily have to be built into one universal large framework. Instead, it can be tailored for different models, hardware, and tasks on-site.

And the new model trained by ForgeTrain is MiniCPM5-1B.

As for their relationship, we explain it with a diagram:

Image 2

Although there have been voices in the industry about "AI making AI" before, they were always limited to specific stages, such as writing a function segment, modifying a script, adjusting a parameter set, etc.

However, this time, China's large language model company made "AI making AI" progress from a concept to a demonstrable, evaluatable, and reproducible engineering sample for the first time.

What Can AI-Made AI Do?

Since MiniCPM5-1B is a model trained by ForgeTrain, the most direct question is:

What can AI-made AI do?

Let's look at the most intuitive scenario—pet companions.

This 1B-parameter-scale small model can reside on your computer desktop, becoming a responsive AI companion that responds to you instantly. You can chat with it, let it continue the conversation based on context, or give it different personalities.

Image 3

Video address:

https://mp.weixin.qq.com/s/Ci0BXKMJHy086MycdqH77w

(This project is based on the secondary development of the clawd-on-desk project:

https://github.com/OpenBMB/MiniCPM-Desk-Pet)

The key point of this pet companion is that it doesn't need to run on cloud-based large model services. With a scale of 1B parameters, it is small enough and has low deployment thresholds.

According to the official statement from Wallace Intelligence, MiniCPM5-1B weighs approximately 2GB under FP16 precision, suitable for GPUs, high-end laptops, and servers; INT4/Q4 precision is about 0.5GB, suitable for mobile devices, tablets, car computers, etc.

MiniCPM5-1B aims to prove that 1B models can also perform well.

In areas like comprehensive knowledge, mathematical reasoning, code reasoning, tool invocation, etc., MiniCPM5-1B presented comparative results against the same-size edge-side models.

In public evaluations, MiniCPM5-1B/think averaged 42.57 points; it also achieved corresponding scores in projects like MMLU-Pro, MMLU-Redux, AIME-2025, AIME-2026, BFCL-v4, AA rankings, etc.

Image 4

Especially noteworthy is that MiniCPM5-1B once again pushed the upper limit of intelligent density for small models.

With only a 1B parameter scale, it surpassed all models below 2B parameters on the internationally renowned AA-Index. Compared to Qwen3.5-2B released three months ago, MiniCPM5-1B not only performed better but had half the number of parameters.

Behind this lies a clearer trend emerging: As model capabilities improve, they are no longer solely dependent on increasing parameter scales. Smaller models are also carrying higher intelligent densities. Observing this trend, the intelligent density of large models is continuously improving at a rate of roughly doubling every 3.5 months.

Image 5

This makes the value of MiniCPM5-1B more clear—it is not just a small-sized model but an end-to-end model that finds a balance between parameter scale, deployment costs, and actual capability.

Besides, it can customize personalities:

Image 6

Video address:

https://mp.weixin.qq.com/s/Ci0BXKMJHy086MycdqH77w

While this might sound like basic functionality in chat products, it is more significant in end-side models because end-side models are closer to users and can easily become lightweight intelligent entry points on local devices.

It can remember user preferred interaction methods and switch styles according to different scenarios.

If large models want to move from the cloud to everyone's devices, the models must be small, affordable, easy to use, and have a complete toolchain.

That's why it emphasizes being developer-friendly.

MiniCPM5-1B provides a toolchain for models, inference, fine-tuning, including SGLang, vLLM, llama.cpp, Ollama, Hugging Face, ArcLight on the inference side; and LLaMA-Factory, ms-swift on the fine-tuning side.

For developers, this is more important than simply giving them a model weight.

Because whether a model can be used depends not only on the model itself but also on whether deploying, inferring, quantizing, fine-tuning, and integrating workflows are convenient.

It Also Outperformed NVIDIA Megatron

If we say MiniCPM5-1B is an AI-made AI product, then ForgeTrain is the factory where AI makes AI. And this factory itself was made by AI.

Wallace Intelligence divided AI-making-AI into five stages:

  • L1: AI gives suggestions, humans execute all operations (represented by Github Copilot)
  • L2: AI assists research, completing specific phases (represented by Cursor, Claude Code)
  • L3: AI produces the next-generation model end-to-end (represented by ForgeTrain)
  • L4: AI recursively self-improves, transforming training pipelines and itself
  • L5: AI autonomously sets research agendas, open exploration

ForgeTrain corresponds to the L3-L4 stage. It hasn't reached the level where AI invents the next generation Transformer, but it has already entered the core infrastructure layer of large model research—pre-training frameworks.

Before this, many large model pre-training frameworks worldwide were written line by line by human programmers. NVIDIA's Megatron, Meta's Fairseq, Google's TensorFlow, none of them were exceptions.

But Wallace Intelligence proposed a completely different approach, Forge Engineering.

Past software engineering emphasized generic frameworks, requiring a framework to be compatible with various models, hardware, and training tasks. The benefit was code reuse, but the downside was difficulty in squeezing each specific scenario to its fullest potential. Like a one-size-fits-all garment, anyone can wear it, but no one wears it perfectly.

Forge Engineering takes a more radical approach: Since AI writes code faster and the cost of code production is lower, why should we still pursue generality? We can write specialized code for different models, hardware, and tasks.

This is like returning from industrialized mass production to high-end customization. AI is that tireless top-tier craftsman, capable of creating the best code for each demand.

But having AI write pre-training frameworks isn't just about writing code. The harder part is: How does it know if it wrote correctly? How fast is it? Is there an issue with memory, parallelism, communication, stability?

This requires Harness.

We can think of Harness as a test room. AI is placed in this room, generating code round after round, running tests, receiving feedback, and continuing to modify. This process is fully automated, without human intervention.

Wallace Intelligence adopted a three-stage construction methodology:

  1. Collect critical data from existing pre-training frameworks to form evaluation standards and Harness
  2. Build binary-consistent versions of the pre-training framework using the evaluation Harness
  3. Remove the binary consistency restriction and iteratively optimize until surpassing the reference implementation

The final result is that ForgeTrain not only aligns functionally with NVIDIA Megatron but also trains faster by 10% under the same hardware conditions.

This means that with the same computational power, ForgeTrain can save 10% of training time and costs.

This is a Worthwhile Matter

Seeing here, you might think this is just a cool technical demonstration. But looking beyond the surface, Wallace Intelligence's release reveals a major transformation happening in the large model industry.

Firstly, the competition in the large model industry is shifting from resource accumulation to extreme efficiency.

Over the past few years, all large model vendors have been competing to achieve miracles through massive resources, parameters, corpora, and tens of thousands of GPU clusters. However, this Scaling Law path has its limits.

When material accumulation reaches its ceiling, what will determine the winner next? Efficiency.

At the same computational budget, who can produce more research iterations? Who has a shorter single-generation R&D cycle? Wallace Intelligence's AI-making-AI provided the answer:

Using AI to replace repetitive labor in human R&D pipelines, compressing weeks of human code development into dozens of minutes. This is the only solution that can counteract resource bottlenecks and enable continuous exponential growth in large model capabilities.

Secondly, the role of AI researchers is undergoing irreversible change.

In systems like ForgeTrain, the role of humans is shifting. From Human in the Loop (executing specific code within a loop) to Human on the Loop (supervising and designing outside the loop).

Future AI scientists no longer need to personally write endless CUDA operators and underlying communication logic. They will become designers and guardians of research systems. They only need to define goals and build Harnesses; the rest of the dirty work will be done by tireless AI.

Lastly, for domestic large language models and domestic chips, this is an ideal opportunity for overtake.

In the past, when evaluating domestic large language models, our focus was always on parameter size, benchmark scores, long text capabilities. But what truly determines a company's and ecosystem's long-term core competitiveness is actually the underlying system—the ability to produce models.

Who can train models faster? Who can make fewer mistakes at a lower cost? Who will survive in the brutal battle of hundreds of models.

More profound strategic significance lies in the domestic computing ecosystem. Everyone knows that Huawei Ascend and other domestic chips are rapidly catching up in hardware computing power, but their biggest weakness lies in the software ecosystem. NVIDIA has millions of developers who spent 15 years stepping on pitfalls and optimizing. This gap is difficult for domestic chips to close quickly with manpower alone.

But ForgeTrain offers a way to break the deadlock.

If there aren't enough people, let's use AI! By automatically generating exclusive pre-training frameworks adapted to various new models and hardware, domestic chips will have the chance to catch up with international top ecosystems through the productivity of AI.

When AI learns to make AI, gears are already accelerating. A new era is unfolding right before us.

MiniCPM5-1B is now fully open-source:

Hugging Face link:

https://huggingface.openbmb.com/model/openbmb/MiniCPM5-1B

GitHub link:

https://github.com/OpenBMB/MiniCPM

ModelScope link:

https://modelscope.cn/models/OpenBMB/MiniCPM5-1B

AtomGit:

https://ai.gitcode.com/OpenBMB/MiniCPM5-1B

Magic Music Community:

https://modelers.cn/models/OpenBMB/MiniCPM5-1B

ForgeTrain open-source link:

https://github.com/OpenBMB/ForgeTrain (to be launched after 5:26 PM)

_Copyright reserved. Unauthorized reproduction or use in any form is strictly prohibited._

AI may generate inaccurate information. Please verify important content.