Model dossier

DeepSeek V4 Flash specs and Pro escalation notes

This page tracks source-backed model details only: IDs, context, output limit, reasoning modes, open weights, and architecture notes. The editorial priority is V4 Flash; Pro is documented as the escalation path.

Official details

V4 specification snapshot

DeepSeek API docs and the model card confirm the model IDs, context window, output limit, open-weight status, and parameter-scale positioning.

API model IDs

deepseek-v4-flash / deepseek-v4-pro

Use deepseek-v4-flash as the default model ID in the DSFlashHub content stack; reserve deepseek-v4-pro for explicit escalation.

Context length

1M tokens

V4 Flash and V4 Pro both publish a 1M-token context window.

Max output

384K tokens

The model card lists a 384K maximum output length.

Reasoning modes

Non-think / Think / Think Max

Both Flash and Pro expose non-thinking and thinking modes; Think Max is reserved for the hardest reasoning cases.

Open weights

MIT licensed weights

The Hugging Face model card lists open weights and links to the V4 technical report.

Architecture

MoE + hybrid attention

V4 Pro is listed as 1.6T total / 49B active; V4 Flash is listed as 284B total / 13B active.

When to use Flash

V4 Flash is the lower-cost route for default chat, OpenClaw agent turns, retrieval augmentation, content processing, batch code explanation, and lower-risk tool calls.

When to use Pro

V4 Pro is the higher-reasoning route for difficult debugging, long-context repository review, incident analysis, and high-value reports. It should not replace every Flash request.