According to Dynamic Insight Beating monitoring, the DeepSeek V4 API has been synchronized with V4-Pro and V4-Flash, and the official account has announced pricing and computational power planning.
V4-Flash directly replaces V3.2 (deepseek-chat) without a price increase; in fact, there has been a decrease: the cache hit input remains the same at 0.2 USD per million tokens, the cache miss input has decreased from 2 USD to 1 USD (a 50% reduction), and the output has decreased from 3 USD to 2 USD (a 33% reduction). The context has been expanded from 128K to 1M, meaning you can now get 8 times the context at a cheaper price. The two old model names, deepseek-chat and deepseek-reasoner, will be deprecated on July 24, 2026, and currently point to V4-Flash in non-reasoning and reasoning modes, respectively.
V4-Pro is a brand-new high-end tier: 1 USD for cache hit input, 12 USD for miss input, and 24 USD for output per million tokens, making the output price 8 times that of V3.2. DeepSeek explained in the pricing table notes that due to the limited high-end computational power, the throughput of Pro services is currently very restricted, but it is expected to undergo a significant price reduction in the second half of the year after the listing of 950 Ascend super nodes in bulk.
Both models support non-reasoning and reasoning modes, with reasoning mode offering a high/max setting for the reasoning_effort parameter. DeepSeek stated in the announcement that "from now on, 1M context will be the standard configuration for all official DeepSeek services."
