Cohere 发布首个开源编程模型「North Mini Code」 小参数、高效率、专做 Agent 编程 参数:MoE 架构(30B, 3B),128专家,每 token 激活 8 个 上下文:...

TL;DR · AI Summary
Cohere 发布开源编程模型 North Mini Code,采用 MoE 架构,专为 Agent 编程优化,性能接近大模型。
Key Takeaways
- North Mini Code 使用 MoE 架构,参数规模为 30B 和 3B,每 token 激活 8 个专家。
- 模型在 SWE-Bench Verified pass@10 达到 80.2%,RL 后性能进一步提升。
- 推理速度比 Devstral Small 2 高约 2.8 倍,词间延迟降低 30%。
Outline
Jump quickly between sections.
Cohere 发布开源编程模型 North Mini Code,采用 MoE 架构,参数规模为 30B 和 3B。
- ·训练方法
模型采用 SFT 和 RLVR 算法,训练数据来自 7 万+ 可验证任务和 5000 个仓库。
- ›性能表现
模型在 SWE-Bench Verified pass@10 达到 80.2%,RL 后性能进一步提升。
- ·推理速度
模型推理速度比 Devstral Small 2 高约 2.8 倍,词间延迟降低 30%。
- §应用场景
模型专为 Agent 编程优化,适用于子 Agent 编排、系统架构理解、Code Review 等场景。
Mindmap
See how the topics connect at a glance.
查看大纲文本(无障碍 / 无 JS 友好)
- North Mini Code
- 模型架构
- MoE 架构
- 参数规模:30B / 3B
- 训练方法
- SFT 和 RLVR 算法
- 训练数据:7 万+ 任务,5000 个仓库
- 性能表现
- SWE-Bench Verified pass@10 = 80.2%
- 推理速度:2.8× Devstral Small 2
Highlights
Key sentences worth saving and sharing.
North Mini Code 使用 MoE 架构,参数规模为 30B 和 3B,每 token 激活 8 个专家。
模型在 SWE-Bench Verified pass@10 达到 80.2%,RL 后性能进一步提升。
推理速度比 Devstral Small 2 高约 2.8 倍,词间延迟降低 30%。
meng shao on X: "Cohere releases its first open-source programming model 'North Mini Code' Small parameters, high efficiency, specialized for Agent programming Parameters: MoE architecture (30B, 3B), 128 experts, 8 activated per token Context: 256K input / 64K output Minimum hardware: 1× H100 (FP8) Official release https://t.co/H5uqf32SyV HuggingFace https://t.co/DloyaGnA9U # https://t.co/6cf5jwkaCk" / X
meng shao
@shao__meng
Cohere releases its first open-source programming model 'North Mini Code' Small parameters, high efficiency, specialized for Agent programming Parameters: MoE architecture (30B, 3B), 128 experts, 8 activated per token Context: 256K input / 64K output Minimum hardware: 1× H100 (FP8) Official release
cohere.com/blog/north-min…
HuggingFace
huggingface.co/CohereLabs/Nor…
#
SFT · Phase one (64K): about 70% of tokens are trainable (43% Agent tool calls + 27% single-round competition/scientific programming), mixed reasoning and instruction following · Phase two (128K): about 4.5B tokens, 61% code, all Agent/reasoning samples, tool calls and completion results are verified to be executable · Data comes from over 70,000 verifiable tasks and about 5,000 repositories; deduplicated with SWE-Bench source to prevent leakage · The goal of SFT is not to beat benchmarks, but to lay the foundation for RL: optimize pass@K and sampling diversity 2. RLVR (Verifiable Reward Reinforcement Learning) · Algorithm: CISPO (token-level importance sampling, long trajectories not diluted by short samples) · Asynchronous sampling: vLLM sidecar + window FIFO queue, alleviate differences in Agent rollout lengths · Joint training with two environments: Terminal (ReAct + bash) + SWE (SWE-Agent) · Rewards: binary reward from unit tests; invalid tool calls or unparsable outputs get 0 points 3. Cross Harness Generalization · Expose multiple Agent scaffolds during training (SWE-Agent, mini-SWE, OpenCode, etc.) · About 6% of phase two SFT data is from other benchmark harness data · OpenCode evaluation is about +10%; pass@1 on mini-SWE-Agent reaches 61.0%, which is 'free transfer' SFT ends with: SWE-Bench Verified pass@10 = 80.2%, Terminal-Bench v2 pass@10 = 55.1%. After RL, Terminal pass@1 +7.9%, SWE pass@1 +3.0%; shorter trajectories, fewer invalid tool calls. # Benchmark performance Agent programming (core selling point) · Artificial Analysis Coding Index: 33.4 · Leads among open-source models of the same scale, such as Qwen3.5 35B-A3B, Gemma 4, Devstral Small 2, etc. · Even exceeds larger models such as Nemotron 3 Super (120B), Mistral Small 4 (119B) · Still slightly lower than Qwen3.6 35B-A3B (about 35.2) Evaluation sets: SWE-Bench Verified/Pro, Terminal-Bench v2/Hard, SciCode, LiveCodeBench v6 Harness: SWE-Agent v1.1.0, ReAct+Tmux, Terminus-2, etc.; temperature=1.0, top_p=0.95, 3 seed average Non-programming Agent tasks are weaker (third-party summary): GDPval-AA ~14%, τ²-Bench Telecom ~37%, Agentic Index overall ~21.7 —— specialized in programming, not a general-purpose Agent. Inference speed (compared with Devstral Small 2, internal Cohere test) · Highest output throughput of about 2.8× under the same concurrency · Inter-word latency about -30% · TTFT slightly worse than Devstral Small 2 # Agent capability design The model natively supports interleaved thinking and tool calls, similar to the Cohere Command series: <|START_THINKING|> ... <|END_THINKING|> <|START_ACTION|> [JSON tool calls] <|END_ACTION|> <|START_TOOL_RESULT|> ... <|END_TOOL_RESULT|> <|START_RESPONSE|> ... <|END_RESPONSE|> Key points for use: · Must include reasoning/thinking in the conversation history, otherwise performance drops · Tool descriptions are recommended to use JSON Schema · Recommended sampling: temperature=1.0, top_p=0.95 · Requires newer Transformers source code, vLLM main + cohere_melody>=0.9.0 Target scenarios: sub-Agent orchestration, system architecture understanding, Code Review, terminal operations, multi-step software engineering.
Cohere
@cohere
19h
Introducing Cohere's first open-source coding model: North Mini Code Small & efficient, designed for agentic performance and built for community input.
Nick Frosst
00:00
1:20 AM · Jun 10, 2026
1.1K
Views
1
6
5
Read 1 reply