T
traeai
Sign in
返回首页
爱范儿

国产AI编程冲上全球第二!实测五大模型,谁才是Vibe Coding神器

8.5Score
国产AI编程冲上全球第二!实测五大模型,谁才是Vibe Coding神器

TL;DR · AI Summary

阿里Qwen3.7 Max在编程竞技榜排名第二,性能优于GPT-5.5、Gemini 3.5 Flash等模型,价格相对合理,适合开发者使用。

Key Takeaways

  • Qwen3.7 Max在编程能力测试中表现优异,尤其是在前端网页设计和小游戏生成方面。
  • Qwen3.7 Max的价格在中规中矩范围内,适合新用户和预算有限的开发者。
  • Qwen3.7 Max在生成复杂功能时,如六边形2048游戏,表现不如Claude Opus 4.7稳定。

Outline

Jump quickly between sections.

  1. 介绍文章背景和目的,比较国产AI编程模型。

  2. Qwen3.7 Max在编程竞技榜排名第二,性能优于其他模型。

  3. 介绍获取Qwen3.7 Max的途径和价格。

  4. 对比Qwen3.7 Max与其他模型在物理模拟、小游戏和网站设计方面的表现。

  5. Qwen3.7 Max在物理模拟测试中表现良好,增加自定义功能。

  6. Qwen3.7 Max生成的游戏在功能实现上不如Claude Opus 4.7稳定。

  7. Qwen3.7 Max在设计复杂网站时表现一般,信息量较少。

Mindmap

See how the topics connect at a glance.

查看大纲文本(无障碍 / 无 JS 友好)
  • 国产AI编程模型比较

Highlights

Key sentences worth saving and sharing.

#Qwen3.7 Max#编程模型#阿里云#前端开发
Open original article

Exceeding GPT-5.5, Gemini 3.5 Flash, and DeepSeek V4 Pro, Alibaba's latest flagship model Qwen3.7 Max secured second place in the programming competition rankings, just behind Claude Opus 4.7.

Image 1

▲ Screenshot from the rankings on May 26

In addition to user choices in real-world scenarios, on traditional large model fixed evaluation charts like Terminal Bench and SWE Bench, Qwen3.7 Max also took the top spot among domestic models.

Image 2

Although it's now been four years since the emergence of large models, we've seen frequent updates to these rankings. However, we can't help but want to experience how the Qwen model, which surpasses GPT 5.5, actually performs.

It's worth noting that the most popular Coding Agent combination right now is likely using GPT 5.5 with Codex.

If we change the default model in Codex to Qwen3.7 Max and use Codex to complete some daily tasks, would it be more useful than GPT 5.5?

Obtain Qwen3.7 Max

While various companies are offering token discount activities, Alibaba Cloud also provides free usage of 1 million tokens on its Baileyan platform.

Image 3

The pricing for Qwen3.7 Max on Alibaba Cloud's official website is currently at a limited-time 50% off offer, with input at 6 yuan per million tokens and output at 18 yuan per million tokens. New users can also enjoy a 50% recharge discount plan, getting 20 yuan of token quota for 10 yuan per month, while the standard Token Plan is priced at 198 yuan per month.

Image 4

Overall, according to data from the large model aggregation platform OpenRouter, Qwen3.7 Max's pricing falls within a reasonable range. Compared to DeepSeek's rock-bottom prices, it's not as competitive, but it's still quite affordable compared to Opus 4.7 and GPT 5.5.

Image 5Image 6

We directly recharged the "Preferred Entry" package, which offers a universal 20 yuan deduction for all models. However, it's important to note that the 50% discount is only applicable to one package; if you purchase the 10 yuan plan, you cannot then buy the 50 or 250 yuan half-price discount plans.

Image 7

Testing with DeepSeek, Claude, GPT, Gemini, and Qwen

With the API Key and a million free tokens, we first used Qwen3.7 Max on Alibaba Cloud Baileyan Platform and Qwen's official website to test its development capabilities with some common front-end web design tasks.

For a physical simulation test where differences can be intuitively observed, we used a simple prompt: "Create an animation of liquid shaking inside a container using HTML+CSS+JS. Dragging the container changes its tilt angle."

Image 8

▲ Qwen3.7-Max, generated on Qwen's official website

Qwen3.7 Max successfully completed this simulation challenge and even added features like custom color, shaking, and adjustable liquid volume.

DeepSeek was simpler, but it didn't make any mistakes.

Image 9

▲ DeepSeek V4, generated on the official website

The liquid generated by GPT-5.5 was a bit strange. While it flowed in the correct direction when the angle changed, the entire wave looked unrealistic.

Image 10

▲ GPT-5.5 Ultra, generated by Codex

Gemini 3.5 Flash seemed to have a bug in generating the webpage; the bottle kept being hidden behind the control panel and had to be dragged out manually. However, it provided a lot of customization options, including the type of bottle and the color of the liquid, with many settings customizable.

Image 11

▲ Gemini 3.5 Flash, generated on the official website, with Canvas option selected

Claude Opus 4.7's bottle was overly simplistic, and the simulated liquid shaking effect was reminiscent of audio waves during intense states.

Image 12

▲ Claude Opus 4.7, generated using the Claude Code app

Next, we tried to get it to generate a small game. Although this game testing was a common project last year with Vibe Coding, this time we wanted the AI to create a 2048 game with hexagonal tiles. The prompt was: "Create a playable 2048 game, but with hexagonal tiles."

The page generated by Qwen3.7 Max was visually appealing, and among the 10 reference sources, most were from CSDN's 2048 game generation tutorials.

The final game could be played, but there were occasional instances where numbers did not merge in the expected positions when moving in the same direction.

Image 13

▲ Qwen3.7 Max, generated on the official website

DeepSeek V4's performance was similar to the previous round, but despite being hexagonal, it only provided WASD keyboard controls to slide.

Image 14

▲ DeepSeek V4, generated on the official website

Claude's Opus 4.7 performed the best this round. It truly understood how the game should be set up, with tile movement following the honeycomb rules, making it easy to navigate.

Image 15

▲ Claude Opus 4.7, generated using the Claude Code app

GPT 5.5, leveraging Codex's capabilities, could open the browser to preview the generated game and fix project code based on console information. The final webpage was excellent, though it wasn't as good as Opus 4.7 at monitoring mouse movements on the screen.

Image 16

▲ GPT-5.5 Ultra, generated by Codex

Gemini 3.5 Flash, as usual, added a lot of extra features. It chose three background themes—cyberpunk, dark gold, and macau—and even included a "built-in high-quality synthesizer."

The gameplay comes with retro 8-bit space sound effects (merge, slide, level up, death) generated by native Web Audio, instantly enhancing the experience.

Image 17

▲ Gemini 3.5 Flash, generated on the official website, with Canvas option selected

Returning to some ordinary web design tasks, we asked it to create a subway museum website, with the prompt: "Design a subway museum-themed website with strong immersion."

Our intention was for these large models to list as much information about different cities' subways, world subway logos, and the overall style should be artistic, with specific styles and sufficient effects to present the content.

First, let's look at Qwen3.7 Max. Honestly, it's hard to evaluate; placing text vertically makes it resemble a train, but the entire website feels chaotic.

Image 18

▲ Qwen3.7-Max, generated on Qwen's official website

Gemini continued to add a lot, with sound effects again being used. Interestingly, it also created a subway cultural product, a custom commemorative ticket generator. We could enter names and select stations to generate a high-quality, retro-style subway commemorative ticket in real-time.

Image 19

▲ Gemini 3.5 Flash, generated on the official website, with Canvas option selected

DeepSeek chose a similar project to Gemini, with ticketing memorabilia and driving experiences, but it didn't seem to present these features in the final deliverables.

Image 20

▲ DeepSeek V4, generated on the official website

GPT 5.5's generated webpage style is quite nice, although it does have obvious template elements. Unfortunately, the information is too sparse. It seems to have misunderstood that a subway museum should be a website introducing subway information.

Image 21

▲ GPT-5.5 Ultra, generated by Codex

Next, using the previous prompt, we asked it to create an operating system for macOS/Windows, with the prompt: "Build a complete browser operating system using HTML."

DeepSeek V4's performance was simple, as was Qwen3.7 Max's, but Qwen3.7 Max added a nice desktop landscape picture.

Image 22

▲ DeepSeek V4, generated on the official website

Image 23

▲ Qwen3.7-Max, generated on Qwen's official website

But in this test, what really impressed me were Gemini 3.5 Flash and GPT 5.5.

Image 24

▲ Gemini 3.5 Flash, generated on the official website, with Canvas option selected

Like Gemini 3.5 Flash, GPT 5.5 also detailed the entire OS design with a specific style.

Image 25

▲ GPT-5.5 Ultra, generated by Codex

Using Qwen3.7 Max in Codex

After a round of testing, it seems that Qwen3.7 Max has difficulty consistently outperforming Gemini and GPT 5.5 in generating small web project prompts. However, compared to its predecessor, I believe there has been significant improvement.

On the Qwen official website, we see some provided code examples such as a 3D Earth, food chain sorting, visualization, personal blog, etc. However, these web project prompts are quite long, unlike the simple one-line prompt we tested.

Image 26

▲ After entering the prompt, Qwen also provides an option for 'Optimization Instructions'

We threw the prompt for the 3D Earth project at DeepSeek V4 and Gemini 3.5 Flash, and the results were almost identical to Qwen3.7 Max.

Image 27Image 28Image 29

This means that the prompt plays a relatively important role in whether Qwen3.7 Max can fully utilize its capabilities at this stage.

Reducing user optimization prompt pressure might involve integrating the Agent product, utilizing their Skills and Agents collaboration abilities to truly leverage the model's strength.

Following the official tutorial from Alibaba Cloud, we successfully integrated Qwen3.7 Max into the Codex terminal assistant.

Image 30

However, a bug is easy to encounter here, where Codex constantly reminds you 'CODEX Missing environment variable'.

After modifying the ~/.codex/config.toml configuration file according to the official tutorial, we still need to modify the computer's environment variables.

The model's API KEY information should be saved in the computer's environment variables (you need to check your computer's shell type and modify the corresponding environment variable file, such as .bash_profile or .zshrc), rather than in the Codex config.toml configuration file.

Image 31

After modification, when you input Codex in the terminal, you will see Qwen3.7 Max. Reopening the Codex App will switch the main interface model from the previous GPT-5.5 to the custom Custom.

Image 32

Using the same method, we can integrate models like DeepSeek, MiniMax, Kimi, Zhishu, etc., into Codex.

A few weeks ago, on GitHub, a frontend Skill received over two hundred thousand stars. It emphasizes making AI-generated front-end interfaces look better, similar to Qwen3.7 Max's second-place ranking task.

First, we install this Skill into Codex and then try combining it to see if it produces better results.

Image 33

▲ Address: https://github.com/Leonxlnx/taste-skill

Inputting the same prompt, Codex automatically calls frontend design, brainstorming, etc., Skills to complete the design positioning and conception, and strictly follows the Codex process control to monitor project generation.

Image 34

Finally, the same model performs much better inside Codex compared to directly on the Qwen official website.

Image 35

However, another issue arises easily: 'stream disconnected before completion: <400> InternalError.Algo.InvalidParameter: The "function.arguments" parameter of the code model must be in JSON format.'

When the model needs to call specialized tools, it cannot connect with the model anymore. We found related problem cases online, which can be attributed to 'the model deployment vendor having issues with the stream output format, not being a standard OpenAI protocol, so it does not support API calls, resulting in a 400 error.'

When asking Codex to explain this issue, Codex also says it's a model problem.

It's not that you configured it wrong; instead, Qwen3.7 Max / BaiLian Responses API is not yet stable enough for calling the Codex agent tool. Being able to chat doesn't mean it can stably run Codex. For long tasks, code modifications, frequent file reading, switching back to the official OpenAI model would be more stable.

So, if you also encounter this issue, you may have to wait for the Qwen team to fix it or start a new session to try again.

Image 37

▲ Alibaba Cloud officially has a guide for different error codes

Last year, we still said that a model is a product, and a good model is a good product. Now, it seems that relying solely on the model is far from enough.

Memory, Harness, Agents orchestration, validation, inference sustainability, etc., as the model's ability increases, this architecture continues to expand. But only when everything is done well might we say 'this is a good model.'

AI may generate inaccurate information. Please verify important content.