According to Perceive Beating monitoring, DeepSeek V4 technical report unveiled the comparison between DeepSeek-V4-Pro-Max (Maximum Inference Intensity Mode) and a closed-source flagship. The comparison group includes Opus 4.6 Max, GPT-5.4 xHigh, Gemini 3.1 Pro High, as well as the open-source models Kimi K2.6 and GLM-5.1, excluding the recently released Opus 4.7 and GPT-5.5.
In terms of encoding, V4-Pro-Max scored 3206 on Codeforces, surpassing GPT-5.4 with 3168 and Gemini 3.1 Pro with 3052, setting a new benchmark record. LiveCodeBench also achieved a top score of 93.5. SWE Verified scored 80.6, slightly below Opus 4.6 at 80.8, a difference of 0.2 percentage points.
For long-context evaluation, both 1M benchmarks of V4-Pro-Max ranked second: CorpusQA 1M scored 62.0, trailing Opus 4.6 at 71.7 but leading Gemini 3.1 Pro at 53.8; MRCR 1M scored 83.5, with Opus 4.6 leading by almost 10 percentage points at 92.9.
In agent tasks, MCPAtlas Public scored 73.6, slightly below Opus 4.6 at 73.8. Terminal-Bench 2.0 scored 67.9, lower than GPT-5.4 at 75.1 and Gemini 3.1 Pro at 68.5.
Regarding knowledge and reasoning, V4-Pro-Max still shows a noticeable gap: GPQA Diamond at 90.1 (Gemini 94.3), SimpleQA-Verified at 57.9 (Gemini 75.6), HLE at 37.7 (Gemini 44.4). As an open-source model, V4-Pro-Max has for the first time matched or even surpassed closed-source flagships in multiple encoding and long-context benchmarks, but still lags behind Gemini 3.1 Pro in knowledge-intensive evaluations.
It is important to note that the above comparison does not include the recently released GPT-5.5 and Opus 4.7. The gap between V4 and the latest generation of closed-source models awaits third-party assessment validation.
