Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model

- Qwen3.6-27B 在主要编码基准测试中表现优于前代 Qwen3.5-397B-A17B。
- 新模型从 807GB 减小到 55.6GB,量化版本仅 16.8GB。
- 本地运行生成复杂 SVG 图像效果出色,适合资源有限环境。
22nd April 2026 - Link Blog
**Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model** ([via](https://news.ycombinator.com/item?id=47863217 "Hacker News")) Big claims from Qwen about their latest open weight model:
Qwen3.6-27B delivers flagship-level agentic coding performance, surpassing the previous-generation open-source flagship Qwen3.5-397B-A17B (397B total / 17B active MoE) across all major coding benchmarks.
On Hugging Face Qwen3.5-397B-A17B is 807GB, this new Qwen3.6-27B is 55.6GB.
I tried it out with the 16.8GB Unsloth Qwen3.6-27B-GGUF:Q4_K_M quantized version and `llama-server` using this recipe by benob on Hacker News, after first installing `llama-server` using `brew install llama.cpp`:
llama-server \
-hf unsloth/Qwen3.6-27B-GGUF:Q4_K_M \
--no-mmproj \
--fit on \
-np 1 \
-c 65536 \
--cache-ram 4096 -ctxcp 2 \
--jinja \
--temp 0.6 \
--top-p 0.95 \
--top-k 20 \
--min-p 0.0 \
--presence-penalty 0.0 \
--repeat-penalty 1.0 \
--reasoning on \
--chat-template-kwargs '{"preserve_thinking": true}'On first run that saved the ~17GB model to `~/.cache/huggingface/hub/models--unsloth--Qwen3.6-27B-GGUF`.
Here's the transcript for "Generate an SVG of a pelican riding a bicycle". This is an _outstanding_ result for a 16.8GB local model:

Performance numbers reported by `llama-server`:
- Reading: 20 tokens, 0.4s, 54.32 tokens/s
- Generation: 4,444 tokens, 2min 53s, 25.57 tokens/s
For good measure, here's Generate an SVG of a NORTH VIRGINIA OPOSSUM ON AN E-SCOOTER (run previously with GLM-5.1):

That one took 6,575 tokens, 4min 25s, 24.74 t/s.