DeepSeek launched two preview versions of its newest large language model, DeepSeek V4. The company claims both models are more efficient and performant than DeepSeek V3.2 due to architectural improvements, and have almost "closed the gap" with current leading models on reasoning benchmarks.
Two Tiers: V4 Flash and V4 Pro
The V4 lineup ships in two variants targeting different cost and capability points:
V4 Pro is the flagship — 1.6 trillion total parameters with 49 billion active parameters at inference time, making it the largest open-weight model currently available. It uses a mixture-of-experts architecture that activates only a fraction of the model for any given token, keeping inference costs manageable despite the massive total parameter count.
V4 Flash is the efficiency variant — 284 billion total parameters with 13 billion active. It trades some benchmark performance for significantly lower operational cost, positioning it as the practical choice for production deployments where GPT-5-class reasoning is needed at scale.
Both models share a 1 million token context window — the longest of any open-weight model currently available — making them suitable for tasks that require reasoning across very large documents or codebases.
Benchmark Performance
DeepSeek's own benchmarks show the V4 models outperforming open-source peers across reasoning tasks, code generation, and mathematical problem-solving. On coding competition benchmarks, DeepSeek estimates V4 performance is "comparable to GPT-5.4." On some individual reasoning tasks, V4 outstrips both OpenAI's GPT-5.2 and Google's Gemini 3.0 Pro.
The gap to frontier models — GPT-5.4 and Gemini 3.1 Pro — is estimated at roughly 3 to 6 months by DeepSeek's own assessment. On knowledge-intensive tests, the models still trail the current frontier, which DeepSeek attributes to training data curation differences rather than architectural limitations.
Pricing
The API pricing positions V4 Flash as a cost-effective alternative to premium models:
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| V4 Flash | $0.14 | $0.28 |
| V4 Pro | $0.145 | $3.48 |
The wide gap between V4 Pro input and output pricing reflects the computational cost difference between inference and extended reasoning traces.
What This Means for AI Coding and Agentic Workflows
The 1M token context window is particularly relevant for AI-assisted coding. Tasks that require understanding large monorepos, reviewing extensive diffs, or maintaining conversation context across long sessions become feasible without context truncation. Combined with the mixture-of-experts architecture that keeps active parameter counts low, V4 Flash in particular represents a cost-efficient option for coding assistant integrations.
The benchmark performance on coding tasks also makes V4 a candidate for AI coding agent backends that previously required frontier model APIs — potentially reducing operational costs for teams building autonomous coding workflows.
Availability
Both models are available via the DeepSeek API, with web and mobile app access for the preview versions. Open-weight weights for V4 Pro are expected in a subsequent release, consistent with DeepSeek's pattern from previous V3 releases.