T
traeai
Sign in
返回首页
爱范儿

MiniMax Launched Mavis, a New Multi-Agent Framework

8.5Score
MiniMax Launched Mavis, a New Multi-Agent Framework

TL;DR · AI Summary

MiniMax has launched Mavis, a new multi-agent framework that improves task execution reliability and efficiency through the 'adversarial' relationship between Worker and Verifier.

Key Takeaways

  • Mavis enhances task execution reliability through the adversarial relationship b
  • Mavis supports multitasking, allowing agents to handle multiple tasks independen
  • Mavis achieves end-to-end context isolation, ensuring each agent focuses only on

Outline

Jump quickly between sections.

  1. §MiniMax Launched Mavis

    MiniMax has launched Mavis, a new multi-agent framework.

  2. Mavis improves task execution reliability through the adversarial relationship between Worker and Verifier.

  3. Mavis supports multitasking, allowing agents to handle multiple tasks independently without context confusion.

  4. Mavis achieves end-to-end context isolation, ensuring each agent focuses only on relevant information.

  5. Mavis demonstrates improved task execution efficiency and accuracy in practical applications.

Mindmap

See how the topics connect at a glance.

查看大纲文本(无障碍 / 无 JS 友好)
  • MiniMax 推出 Mavis
    • Mavis 的运作机制
      • Leader、Worker 和 Verifier 之间的对抗关系
    • Mavis 的多任务处理能力
      • 支持多任务处理,避免上下文污染
    • Mavis 的上下文隔离特性
      • 端到端的上下文隔离

Highlights

Key sentences worth saving and sharing.

  • Mavis enhances task execution reliability through the adversarial relationship between Worker and Verifier.

    Paragraph 4

    ⬇︎ 下载 PNG𝕏 分享到 X
  • Mavis supports multitasking, allowing agents to handle multiple tasks independently without context confusion.

    Paragraph 10

    ⬇︎ 下载 PNG𝕏 分享到 X
  • MiniMax's solution is to decouple the 'second response' and 'execution' logic.

    Paragraph 11

    ⬇︎ 下载 PNG𝕏 分享到 X
#MiniMax#Agent#Multi-Agent Framework#Context Isolation
Open original article

I gave an agent a task, and it started in plan mode, outlining seven steps.

I approved it, and it began to run. After completing three steps, it stopped to report: "I have completed steps 1, 2, and 3. Here are the results. Should I continue with steps 4, 5, 6, and 7?"

I said to proceed. It ran two more steps, then stopped again: "I have completed steps 4 and 5. Here are the results. Should I continue with steps 6 and 7?"

By the end of the night, assigning an agent to perform a long-term task did not yield long-term results; the back-and-forth was all about "proceed."

For a long time, my experience using various agents for work has been like this.

Image 1

This experience is illogical. Although pausing to confirm is a good practice when working with AI, I never explicitly asked it to stop, but it did so anyway.

MiniMax attributes this behavior of agent products to "contextual anxiety." The core issue is that the model itself is unsure when a long-term task is truly complete. In other words, it's not that the model can't do it, but that it's hesitant to do so, fearing making mistakes at each step, which leads to stopping halfway and asking for confirmation.

Today, MiniMax's Agent desktop version underwent a major update. A new mode called Mavis (actually a shorthand for "MiniMax as a Jarvis") was introduced.

It's no longer uncommon to have an agent act as a boss while a group of agents work as employees. However, MiniMax points out that previous mainstream multi-agent frameworks essentially relied on prompt arrangement for models to engage in role-playing. But this approach is unsustainable and often encounters issues such as contextual anxiety, long-term task degradation, and self-checking.

A reliable infrastructure for a multi-agent system requires continuous operation and maintenance, and multiple agents should not collude. This is what MiniMax is doing.

MiniMax calls its agent infrastructure Team Engine, which includes three core roles: Leader, Worker, and Verifier. As the names suggest, one role manages, another performs tasks, and the third verifies the results.

The key difference lies in the adversarial relationship between Workers and Verifiers, ensuring neither can cheat.

Image 2

Recently, APPSO was researching a topic: "All model vendors aspiring to be in the Coding/Agent space must develop their own independent Coding/Agent products."

(Indeed, MiniMax was previously a counterexample, but it turned out to prove itself before we even published the article!)

So we ran this research topic through MiniMax's Agent Team once again.

The task was broken down into five workers, each of whom would compile their results and hand them over to the leader (displayed as "Mavis to General" or "General to Mavis," etc.).

Image 3

One worker took 12 minutes without returning any results. APPSO noticed that the leader couldn't wait and sent a bash command to check the worker's status:

Image 4

After all five workers completed their tasks, the leader generated five verifiers—agents displayed with yellow hats—in the task list:

Image 5

The verifiers quickly found errors! One verifier identified a clear data error in the corresponding worker's deliverables and issued a "failure" verdict. The worker then restarted (displayed as running, with a blue circle indicating).

Image 6

We could see the worker's thought process by entering its workspace: "The verifier rejected my previous deliverables based on these three errors... I need to go back and re-examine the critical facts and check and correct specific numerical issues..."

Indeed, agents working against each other are very thorough, making the process highly reliable.

Image 7

This back-and-forth occurred numerous times across the five 1v1 agent confrontations. During the process, Mavis also noted that it had "learned something new" and updated its memory accordingly.

Image 8

Next, we initiated a new deep study based on authoritative data analysis of the tourism market during the May Day holiday period, and delivered a multidimensional analysis report.

This study was more complex than the previous one. And because of the ongoing confrontation, the Agent Team spent significantly more time on this deep study compared to a single agent.

But the final report presented was cleaner and more credible than reports from other AI deep studies.

Image 9

Recently, APPSO has been planning many offline events, and developing event plans and schemes has always been challenging. We also handed this task over to Mavis to see how it would handle it.

I need to plan an AI developer offline salon in Guangzhou. Please provide me with multiple suitable venues for events of 100 to 1000 people, along with approximate quotes. Also, gather information on similar events and help me plan the theme, promotion, and operations of the AI event. Organize all of this into a strict business plan format and a beautifully designed webpage that fits the theme.

Image 10

The time required to develop the plan was longer than the previous deep study task. Mavis replied, "This task is large and will require multiple agents to work in parallel—venue research, competitor analysis, theme planning, business plan preparation, and webpage development."

Mavis's standout feature is that we can continuously add new requirements:

While providing the long report, also draft a preliminary formal contract for venue collaboration and guest invitation collaboration, and include preliminary financial statements. Give me a detailed presentation in the form of a PPT.

The Agent Team received the new requirements and further refined the plan, initiating additional workflows. Ultimately, we launched up to nine parallel tasks.

Image 11

When we opened Mavis's thought process, we saw a lot of messages being exchanged among the agents. These agents worked under the Team Engine, sharing their states, some waiting, some executing, and others verifying.

Image 12

Isn't this Verifier like a meticulous 'client' who demands perfection?

Image 13

In total, the number of files delivered for this task was astonishingly high, including XLS, PPT, HTML webpages, and their corresponding .md versions.

Image 14

▲ The financial budget table generated by the Agent Team, including project budget summary, cash flow forecast, pricing models for tickets and sponsorships, and detailed cost breakdown.

Next, let's discuss another significant feature of Mavis: its ability to connect to chat platforms and support multitasking.

Similar to MiniMax's previously supported OpenClaw and Hermes Agent, Mavis can also distribute tasks via WeChat and Feishu, two IM channels. The integration process is extremely simplified; just click the settings button, scan the QR code, and name it, and you can use Mavis within WeChat/Feishu.

Image 15

Typically, when an agent is assigned a long-term task in most Agent products, after sending the message, you can no longer consult it on other issues.

Part of the reason is that these agents cannot simultaneously open multiple chat windows; another reason is the limitations of the agent's work mode, where running multiple tasks in a single session can easily lead to context confusion, causing context pollution.

MiniMax's solution is to decouple the "quick response" logic from the "execution" logic.

While APPSO was using WeChat to research recent oil price increases, I asked it to research important products released by Silicon Valley AI giants in the past month.

Mavis didn't pause the previous task and directly informed me that the new task was completed, while the oil price task was still processing.

Image 16

This is precisely the benefit of Mavis's design philosophy: context isolation.

Each agent and the Agent Teams only see relevant summaries of information related to their tasks. Only when details are needed do they read the full text.

This approach not only controls token costs but also prevents context pollution, as incorrect information encountered during searches won't affect the entire team.

In the most extreme scenario, we tried to assign eight tasks to Mavis through Feishu in a very short period, and it handled them without any context confusion.

Overall, the experience feels like working with a colleague who has a high cognitive capacity: able to respond quickly while keeping the task execution smooth. You can ask about progress without worrying about disrupting its 'flow.'

Image 17

Agents handling different sessions only see information relevant to their tasks and do not share a growing conversation history.

In essence, Mavis achieves end-to-end context isolation from the IM channel to the task hub and down to each sub-agent.

Finally, Mavis successfully completed the main thread of the oil price task while also delivering a detailed report on the AI giants' new releases and embodied intelligence products.

Image 18

After testing, you might notice that Mavis's scheduling strategy somewhat resembles the popular "Three Departments and Six Ministries" skill from earlier.

What each role does, when to start, and when to hand off, will be decided by a state machine at the engine level, rather than the black-box model deciding arbitrarily.

In simple terms, this is about using engineering-level control, rigor, and determinacy to address the unpredictability and randomness of models in multi-agent work scheduling.

This approach completely resolves the classic problem of agents/models both acting as judges and participants.

Image 19

After testing Mavis, let's talk about another important thing MiniMax did for all paying users: this time, the Token Plan and Agent Plan were merged.

Image 20

With the merge, whether it's regular user "daily usage" on the website or app, or integrating official APIs to call other tools (such as coding products or OpenClaw/Hermes Agents), users can now utilize a unified package plan. Moreover, whether it's the flagship model M2.7 or subsequent multimodal models, they are all included in this unified plan.

Users can decide how to allocate the shared quota. MiniMax also offers a perk: users who previously subscribed to both plans will receive an extra month of membership.

Why did MiniMax do this? From the user perspective, it makes perfect sense.

In the age of agents, users pay for the "model's computational power," and as the scenarios requiring this power become increasingly diverse due to advancements in coding, agent, and multimodal capabilities, these needs naturally arise in the vendor's products (website, standalone product, CLI) and beyond (independent deployments that integrate external APIs).

Image 22

This time, MiniMax took the lead by removing the walls within its product matrix. APPSO believes that in today's highly commoditized model market where users flock to the latest and cheapest APIs, a unified package strategy can actually help maintain user loyalty for model vendors.

Back to the product itself.

As mentioned earlier, APPSO is writing an article on "serious model vendors must develop their own coding/agent products." MiniMax can be seen as a latecomer but a worthy one.

In today's market, Mavis is not the first product to bet on a multi-agent architecture. Over the past six months, companies like ChatGPT, Manus, and Genspark have all joined the "multi-agent" battle.

After practical testing, APPSO's impression is that Mavis performs better and has a more stable architecture than its peers in running complex or long-term tasks. While other products' multi-agents are still停留在提示词编排和拆任务阶段,Mavis 在工程层面上设定了对抗性硬约束——这种差异带来的体验提升非常明显。

然而,这套架构虽然美好,却也难以避免现实中的高昂代价。

Image 23

In its technical blog, MiniMax discussed the "consensus cost" (Cost of Consensus) of multi-agent systems. In simpler terms, agents working in balance indeed make the process and results more reliable, but reaching consensus comes at a cost; it consumes several times more tokens than a single agent. Moreover, just like a heated argument, they might veer off-topic and reduce accuracy.

According to MiniMax's analysis, the Agent Team architecture specifically incurs three types of costs:

Firstly, there is the handover cost. Information needs to be reorganized when passed between agents, and each handover requires translating the information into a form usable by the next agent, which consumes tokens;

Secondly, there is the sharing (context information) cost. The design of context isolation aims to control this cost. Even if each agent only views summaries passed from others, as the number of agents grows, storing and distributing these summaries will also incur costs.

Thirdly, there is the aggregation cost. This is a point that APPSO has always wanted to explain: don't assume that a workflow with hundreds or thousands of skills and a complex "Three Departments and Six Ministries" system is a breakthrough—often, it's just a trap set by token providers. You may indeed make the work more detailed, but you'll also need to spend more tokens to aggregate and organize the final result.

These costs together mean that having more agents isn't always a simple case of "the more agents, the better."

However, viewed from another angle: the more complex the information interaction, the higher the value of the work often is. A deep research report that requires multiple verifications and cross-checks should not be measured by the same cost logic as a casual question. Mavis is expensive because it takes the task seriously, and such serious handling is worth the price.

Rather than skimping on costs to ensure nothing goes wrong, what truly valuable users of complex tasks care about is ensuring that the work is done without shortcuts or clever tricks.

Of course, MiniMax's team has made some engineering designs to avoid redundant program execution leading to wasted tokens.

MiniMax advises users: Agent Team is designed for "expensive and complex" tasks and is a strategic option rather than a default one. Users should judge the complexity of the task, the length of the chain, the risk, and the value of experience reuse—tasks with higher values in these aspects are more suitable for using Agent Team. Conversely, single-agent solutions or ordinary chats can be used for less complex tasks.

Image 24

Are multi-agent systems necessarily smarter? Not necessarily. But Mavis's significance lies in providing a rigorous, verified engineering system for truly complex and knowledge-intensive tasks, rather than leaving them to the whims of models.

It may not make AI smarter, but it definitely makes AI less likely to slack off—a problem that large models have long struggled with.

After all, in real-world interpersonal work, we don't actually need colleagues to be very smart... Just that they don't slack off or play tricks is often enough, isn't it?

Author: Du Chen, Zhang Zihao

AI may generate inaccurate information. Please verify important content.