According to Dynamics Beating monitoring, OpenAI's technical staff Clive Chan stated that the overall V4 technical report is still top-notch, but the hardware recommendations section for chip manufacturers is "surprisingly mediocre and even erroneous," contrasting with V3. The Q&A section of the V3 hardware chapter was once the hottest discussion at the academic conference ISCA, with suggestions specific to the industry's ongoing development of interconnect standards, while V4 has become more vague.
Chan raised doubts point by point. Regarding power consumption, the report stated that software optimization allows chips to run computing, storage, and communication at full load simultaneously, suggesting that chip manufacturers allocate more power headroom. Chan believes this is "counterproductive": total power consumption of the chip is limited by the physical process; having more power headroom means lowering the operating frequency, resulting in reduced computing power. Regarding the data transfer method between GPUs, the report suggested choosing to have the GPU actively read data (pull) instead of having the other party push (push) because the notification overhead of push is too high. Chan questioned this judgment, suggesting that pull is actually slower, and the data processing capability of the network card should be improved. However, the two may not be discussing the same level of issues: the report is referring to the overhead of the notification mechanism, while Chan is talking about the latency of the transfer itself. Regarding the activation function, the report suggested replacing SwiGLU with a simpler function to reduce the computational burden. Chan believes it is unnecessary because Sonic MoE has already proven that SwiGLU can achieve optimal performance. Chan suspects that DeepSeek may have "intentionally weakened this section."
