According to Perceive Beating monitoring, DeepSeek V4 has publicly disclosed rare internal dogfooding data. The team collected about 200 real R&D tasks from over 50 engineers, covering feature development, bug fixes, refactoring, and diagnostics. The technology stack includes PyTorch, CUDA, Rust, and C++. After rigorous screening, 30 tasks were retained as the evaluation set.
V4-Pro-Max has a pass rate of 67%, significantly higher than Sonnet 4.5 at 47%, close to Opus 4.5 at 70%, but lower than Opus 4.5 Thinking at 73% and Opus 4.6 Thinking at 80%. Haiku 4.5 has a pass rate of only 13%.
In an internal survey with N=85, all respondents reported using V4-Pro for agentic coding in their daily work. 52% believed V4-Pro could serve as the default main coding model, 39% were inclined to agree, and less than 9% disagreed. The main feedback included low-level errors, misinterpretation of vague prompts, and occasional overthinking.
