国产AI编程冲上全球第二!实测五大模型,谁才是Vibe Coding神器

TL;DR · AI Summary
阿里Qwen3.7 Max在编程竞技榜排名第二,性能优于GPT-5.5、Gemini 3.5 Flash等模型,价格相对合理,适合开发者使用。
Key Takeaways
- Qwen3.7 Max在编程能力测试中表现优异,尤其是在前端网页设计和小游戏生成方面。
- Qwen3.7 Max的价格在中规中矩范围内,适合新用户和预算有限的开发者。
- Qwen3.7 Max在生成复杂功能时,如六边形2048游戏,表现不如Claude Opus 4.7稳定。
Outline
Jump quickly between sections.
Mindmap
See how the topics connect at a glance.
查看大纲文本(无障碍 / 无 JS 友好)
- 国产AI编程模型比较
Highlights
Key sentences worth saving and sharing.
Qwen3.7 Max在编程竞技榜排名第二,性能优于GPT-5.5、Gemini 3.5 Flash等模型。
Qwen3.7 Max的价格在中规中矩范围内,适合新用户和预算有限的开发者。
Qwen3.7 Max在生成复杂功能时,如六边形2048游戏,表现不如Claude Opus 4.7稳定。
Exceeding GPT-5.5, Gemini 3.5 Flash, and DeepSeek V4 Pro, Alibaba's latest flagship model Qwen3.7 Max secured second place in the programming competition rankings, just behind Claude Opus 4.7.

▲ Screenshot from the rankings on May 26
In addition to user choices in real-world scenarios, on traditional large model fixed evaluation charts like Terminal Bench and SWE Bench, Qwen3.7 Max also took the top spot among domestic models.

Although it's now been four years since the emergence of large models, we've seen frequent updates to these rankings. However, we can't help but want to experience how the Qwen model, which surpasses GPT 5.5, actually performs.
It's worth noting that the most popular Coding Agent combination right now is likely using GPT 5.5 with Codex.
If we change the default model in Codex to Qwen3.7 Max and use Codex to complete some daily tasks, would it be more useful than GPT 5.5?
Obtain Qwen3.7 Max
While various companies are offering token discount activities, Alibaba Cloud also provides free usage of 1 million tokens on its Baileyan platform.

The pricing for Qwen3.7 Max on Alibaba Cloud's official website is currently at a limited-time 50% off offer, with input at 6 yuan per million tokens and output at 18 yuan per million tokens. New users can also enjoy a 50% recharge discount plan, getting 20 yuan of token quota for 10 yuan per month, while the standard Token Plan is priced at 198 yuan per month.

Overall, according to data from the large model aggregation platform OpenRouter, Qwen3.7 Max's pricing falls within a reasonable range. Compared to DeepSeek's rock-bottom prices, it's not as competitive, but it's still quite affordable compared to Opus 4.7 and GPT 5.5.


We directly recharged the "Preferred Entry" package, which offers a universal 20 yuan deduction for all models. However, it's important to note that the 50% discount is only applicable to one package; if you purchase the 10 yuan plan, you cannot then buy the 50 or 250 yuan half-price discount plans.

Testing with DeepSeek, Claude, GPT, Gemini, and Qwen
With the API Key and a million free tokens, we first used Qwen3.7 Max on Alibaba Cloud Baileyan Platform and Qwen's official website to test its development capabilities with some common front-end web design tasks.
For a physical simulation test where differences can be intuitively observed, we used a simple prompt: "Create an animation of liquid shaking inside a container using HTML+CSS+JS. Dragging the container changes its tilt angle."

▲ Qwen3.7-Max, generated on Qwen's official website
Qwen3.7 Max successfully completed this simulation challenge and even added features like custom color, shaking, and adjustable liquid volume.
DeepSeek was simpler, but it didn't make any mistakes.

▲ DeepSeek V4, generated on the official website
The liquid generated by GPT-5.5 was a bit strange. While it flowed in the correct direction when the angle changed, the entire wave looked unrealistic.

▲ GPT-5.5 Ultra, generated by Codex
Gemini 3.5 Flash seemed to have a bug in generating the webpage; the bottle kept being hidden behind the control panel and had to be dragged out manually. However, it provided a lot of customization options, including the type of bottle and the color of the liquid, with many settings customizable.

▲ Gemini 3.5 Flash, generated on the official website, with Canvas option selected
Claude Opus 4.7's bottle was overly simplistic, and the simulated liquid shaking effect was reminiscent of audio waves during intense states.

▲ Claude Opus 4.7, generated using the Claude Code app
Next, we tried to get it to generate a small game. Although this game testing was a common project last year with Vibe Coding, this time we wanted the AI to create a 2048 game with hexagonal tiles. The prompt was: "Create a playable 2048 game, but with hexagonal tiles."
The page generated by Qwen3.7 Max was visually appealing, and among the 10 reference sources, most were from CSDN's 2048 game generation tutorials.
The final game could be played, but there were occasional instances where numbers did not merge in the expected positions when moving in the same direction.

▲ Qwen3.7 Max, generated on the official website
DeepSeek V4's performance was similar to the previous round, but despite being hexagonal, it only provided WASD keyboard controls to slide.

▲ DeepSeek V4, generated on the official website
Claude's Opus 4.7 performed the best this round. It truly understood how the game should be set up, with tile movement following the honeycomb rules, making it easy to navigate.

▲ Claude Opus 4.7, generated using the Claude Code app
GPT 5.5, leveraging Codex's capabilities, could open the browser to preview the generated game and fix project code based on console information. The final webpage was excellent, though it wasn't as good as Opus 4.7 at monitoring mouse movements on the screen.

▲ GPT-5.5 Ultra, generated by Codex
Gemini 3.5 Flash, as usual, added a lot of extra features. It chose three background themes—cyberpunk, dark gold, and macau—and even included a "built-in high-quality synthesizer."
The gameplay comes with retro 8-bit space sound effects (merge, slide, level up, death) generated by native Web Audio, instantly enhancing the experience.

▲ Gemini 3.5 Flash, generated on the official website, with Canvas option selected
Returning to some ordinary web design tasks, we asked it to create a subway museum website, with the prompt: "Design a subway museum-themed website with strong immersion."
Our intention was for these large models to list as much information about different cities' subways, world subway logos, and the overall style should be artistic, with specific styles and sufficient effects to present the content.
First, let's look at Qwen3.7 Max. Honestly, it's hard to evaluate; placing text vertically makes it resemble a train, but the entire website feels chaotic.

▲ Qwen3.7-Max, generated on Qwen's official website
Gemini continued to add a lot, with sound effects again being used. Interestingly, it also created a subway cultural product, a custom commemorative ticket generator. We could enter names and select stations to generate a high-quality, retro-style subway commemorative ticket in real-time.

▲ Gemini 3.5 Flash, generated on the official website, with Canvas option selected
DeepSeek chose a similar project to Gemini, with ticketing memorabilia and driving experiences, but it didn't seem to present these features in the final deliverables.

▲ DeepSeek V4, generated on the official website
GPT 5.5's generated webpage style is quite nice, although it does have obvious template elements. Unfortunately, the information is too sparse. It seems to have misunderstood that a subway museum should be a website introducing subway information.

▲ GPT-5.5 Ultra, generated by Codex
Next, using the previous prompt, we asked it to create an operating system for macOS/Windows, with the prompt: "Build a complete browser operating system using HTML."
DeepSeek V4's performance was simple, as was Qwen3.7 Max's, but Qwen3.7 Max added a nice desktop landscape picture.

▲ DeepSeek V4, generated on the official website

▲ Qwen3.7-Max, generated on Qwen's official website
But in this test, what really impressed me were Gemini 3.5 Flash and GPT 5.5.

▲ Gemini 3.5 Flash, generated on the official website, with Canvas option selected
Like Gemini 3.5 Flash, GPT 5.5 also detailed the entire OS design with a specific style.

▲ GPT-5.5 Ultra, generated by Codex
Using Qwen3.7 Max in Codex
After a round of testing, it seems that Qwen3.7 Max has difficulty consistently outperforming Gemini and GPT 5.5 in generating small web project prompts. However, compared to its predecessor, I believe there has been significant improvement.
On the Qwen official website, we see some provided code examples such as a 3D Earth, food chain sorting, visualization, personal blog, etc. However, these web project prompts are quite long, unlike the simple one-line prompt we tested.

▲ After entering the prompt, Qwen also provides an option for 'Optimization Instructions'
We threw the prompt for the 3D Earth project at DeepSeek V4 and Gemini 3.5 Flash, and the results were almost identical to Qwen3.7 Max.



This means that the prompt plays a relatively important role in whether Qwen3.7 Max can fully utilize its capabilities at this stage.
Reducing user optimization prompt pressure might involve integrating the Agent product, utilizing their Skills and Agents collaboration abilities to truly leverage the model's strength.
Following the official tutorial from Alibaba Cloud, we successfully integrated Qwen3.7 Max into the Codex terminal assistant.

However, a bug is easy to encounter here, where Codex constantly reminds you 'CODEX Missing environment variable'.
After modifying the ~/.codex/config.toml configuration file according to the official tutorial, we still need to modify the computer's environment variables.
The model's API KEY information should be saved in the computer's environment variables (you need to check your computer's shell type and modify the corresponding environment variable file, such as .bash_profile or .zshrc), rather than in the Codex config.toml configuration file.

After modification, when you input Codex in the terminal, you will see Qwen3.7 Max. Reopening the Codex App will switch the main interface model from the previous GPT-5.5 to the custom Custom.

Using the same method, we can integrate models like DeepSeek, MiniMax, Kimi, Zhishu, etc., into Codex.
A few weeks ago, on GitHub, a frontend Skill received over two hundred thousand stars. It emphasizes making AI-generated front-end interfaces look better, similar to Qwen3.7 Max's second-place ranking task.
First, we install this Skill into Codex and then try combining it to see if it produces better results.

▲ Address: https://github.com/Leonxlnx/taste-skill
Inputting the same prompt, Codex automatically calls frontend design, brainstorming, etc., Skills to complete the design positioning and conception, and strictly follows the Codex process control to monitor project generation.

Finally, the same model performs much better inside Codex compared to directly on the Qwen official website.

However, another issue arises easily: 'stream disconnected before completion: <400> InternalError.Algo.InvalidParameter: The "function.arguments" parameter of the code model must be in JSON format.'
When the model needs to call specialized tools, it cannot connect with the model anymore. We found related problem cases online, which can be attributed to 'the model deployment vendor having issues with the stream output format, not being a standard OpenAI protocol, so it does not support API calls, resulting in a 400 error.'
When asking Codex to explain this issue, Codex also says it's a model problem.
It's not that you configured it wrong; instead, Qwen3.7 Max / BaiLian Responses API is not yet stable enough for calling the Codex agent tool. Being able to chat doesn't mean it can stably run Codex. For long tasks, code modifications, frequent file reading, switching back to the official OpenAI model would be more stable.
So, if you also encounter this issue, you may have to wait for the Qwen team to fix it or start a new session to try again.

▲ Alibaba Cloud officially has a guide for different error codes
Last year, we still said that a model is a product, and a good model is a good product. Now, it seems that relying solely on the model is far from enough.
Memory, Harness, Agents orchestration, validation, inference sustainability, etc., as the model's ability increases, this architecture continues to expand. But only when everything is done well might we say 'this is a good model.'