---
title: "DeepSeek V4 - almost on the frontier, a fraction of the price"
source_name: "Simon Willison's Weblog"
original_url: "https://simonwillison.net/2026/Apr/24/deepseek-v4/#atom-everything"
canonical_url: "https://www.traeai.com/articles/e4d15291-5597-4db2-88dd-af93f4027a01"
content_type: "article"
language: "中文"
score: 5
tags: []
published_at: "2026-04-24T06:01:04+00:00"
created_at: "2026-04-24T06:26:12.142081+00:00"
---

# DeepSeek V4 - almost on the frontier, a fraction of the price

Canonical URL: https://www.traeai.com/articles/e4d15291-5597-4db2-88dd-af93f4027a01
Original source: https://simonwillison.net/2026/Apr/24/deepseek-v4/#atom-everything

## Summary

traeai 为开发者、研究员和内容团队筛选高质量 AI 技术内容，提供摘要、评分、趋势雷达与一键内容产出。

## Key Takeaways

- 
- 
- 

## Content

Title: DeepSeek V4—almost on the frontier, a fraction of the price

URL Source: http://simonwillison.net/2026/Apr/24/deepseek-v4/

Markdown Content:
24th April 2026

Chinese AI lab DeepSeek’s last model release was V3.2 (and V3.2 Speciale) [last December](https://simonwillison.net/2025/Dec/1/deepseek-v32/). They just dropped the first of their hotly anticipated V4 series in the shape of two preview models, [DeepSeek-V4-Pro](https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro) and [DeepSeek-V4-Flash](https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash).

Both models are 1 million token context Mixture of Experts. Pro is 1.6T total parameters, 49B active. Flash is 284B total, 13B active. They’re using the standard MIT license.

I think this makes DeepSeek-V4-Pro the new largest open weights model. It’s larger than Kimi K2.6 (1.1T) and GLM-5.1 (754B) and more than twice the size of DeepSeek V3.2 (685B).

Pro is 865GB on Hugging Face, Flash is 160GB. I’m hoping that a lightly quantized Flash will run on my 128GB M5 MacBook Pro. It’s _possible_ the Pro model may run on it if I can stream just the necessary active experts from disk.

For the moment I tried the models out via [OpenRouter](https://openrouter.ai/), using [llm-openrouter](https://github.com/simonw/llm-openrouter):

```
llm install llm-openrouter
llm openrouter refresh
llm -m openrouter/deepseek/deepseek-v4-pro 'Generate an SVG of a pelican riding a bicycle'
```

Here’s the pelican [for DeepSeek-V4-Flash](https://gist.github.com/simonw/4a7a9e75b666a58a0cf81495acddf529):

![Image 1: Excellent bicycle - good frame shape, nice chain, even has a reflector on the front wheel. Pelican has a mean looking expression but has its wings on the handlebars and feet on the pedals. Pouch is a little sharp.](https://static.simonwillison.net/static/2026/deepseek-v4-flash.png)

And [for DeepSeek-V4-Pro](https://gist.github.com/simonw/9e8dfed68933ab752c9cf27a03250a7c):

![Image 2: Another solid bicycle, albeit the spokes are a little jagged and the frame is compressed a bit. Pelican has gone a bit wrong - it has a VERY large body, only one wing, a weirdly hairy backside and generally loos like it was drown be a different artist from the bicycle.](https://static.simonwillison.net/static/2026/deepseek-v4-pro.png)

For comparison, take a look at the pelicans I got from [DeepSeek V3.2 in December](https://simonwillison.net/2025/Dec/1/deepseek-v32/), [V3.1 in August](https://simonwillison.net/2025/Aug/22/deepseek-31/), and [V3-0324 in March 2025](https://simonwillison.net/2025/Mar/24/deepseek/).

So the pelicans are pretty good, but what’s really notable here is the _cost_. DeepSeek V4 is a very, very inexpensive model.

Here’s [DeepSeek’s pricing page](https://api-docs.deepseek.com/quick_start/pricing). They’re charging $0.14/million tokens input and $0.28/million tokens output for Flash, and $1.74/million input and $3.48/million output for Pro.

Here’s a comparison table with the frontier models from Gemini, OpenAI and Anthropic:

| Model | Input ($/M) | Output ($/M) |
| --- | --- | --- |
| **DeepSeek V4 Flash** | $0.14 | $0.28 |
| GPT-5.4 Nano | $0.20 | $1.25 |
| Gemini 3.1 Flash-Lite | $0.25 | $1.50 |
| Gemini 3 Flash Preview | $0.50 | $3 |
| GPT-5.4 Mini | $0.75 | $4.50 |
| Claude Haiku 4.5 | $1 | $5 |
| **DeepSeek V4 Pro** | $1.74 | $3.48 |
| Gemini 3.1 Pro | $2 | $12 |
| GPT-5.4 | $2.50 | $15 |
| Claude Sonnet 4.6 | $3 | $15 |
| Claude Opus 4.7 | $5 | $25 |
| GPT-5.5 | $5 | $30 |

DeepSeek-V4-Flash is the cheapest of the small models, beating even OpenAI’s GPT-5.4 Nano. DeepSeek-V4-Pro is the cheapest of the larger frontier models.

This note from [the DeepSeek paper](https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash/blob/main/DeepSeek_V4.pdf) helps explain why they can price these models so low—they’ve focused a great deal on efficiency with this release, especially for longer context prompts:

> In the scenario of 1M-token context, even DeepSeek-V4-Pro, which has a larger number of activated parameters, attains only 27% of the single-token FLOPs (measured in equivalent FP8 FLOPs) and 10% of the KV cache size relative to DeepSeek-V3.2. Furthermore, DeepSeek-V4-Flash, with its smaller number of activated parameters, pushes efficiency even further: in the 1M-token context setting, it achieves only 10% of the single-token FLOPs and 7% of the KV cache size compared with DeepSeek-V3.2.

DeepSeek’s self-reported benchmarks [in their paper](https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash/blob/main/DeepSeek_V4.pdf) show their Pro model competitive with those other frontier models, albeit with this note:

> Through the expansion of reasoning tokens, DeepSeek-V4-Pro-Max demonstrates superior performance relative to GPT-5.2 and Gemini-3.0-Pro on standard reasoning benchmarks. Nevertheless, its performance falls marginally short of GPT-5.4 and Gemini-3.1-Pro, suggesting a developmental trajectory that trails state-of-the-art frontier models by approximately 3 to 6 months.

I’m keeping an eye on [huggingface.co/unsloth/models](https://huggingface.co/unsloth/models) as I expect the Unsloth team will have a set of quantized versions out pretty soon. It’s going to be very interesting to see how well that Flash model runs on my own machine.
