According to Oberserve Beating monitoring, Huang Renxun extensively elaborated on the design philosophy of the Vera CPU during his keynote speech at GTC Taipei 2026. He pointed out that all CPUs in the past were designed for humans, with the human interactive world operating at the second level, and cloud CPUs rented out based on the number of cores. However, the world of Agents operates at the nanosecond level: every time a tool is called, or a database is accessed, the Agent demands the fastest response, and any delay will block the next step of reasoning. "In the past, we built CPUs for 1 billion humans, but in the future, we need to build CPUs for billions of Agents."
He summarized the four key design principles of the Vera CPU. First, the world's highest single-thread performance (IPC): capable of executing 10 instructions per clock cycle, as Agents require ultra-low latency rather than traditional throughput. Second, world-class per-core bandwidth. Third, intra-chip total bandwidth breaks the limit: a brand-new interconnect architecture connects all CPU cores at the speed of light, with a cross-sectional bandwidth of 3.6 TB/s, no chiplet boundaries, no cross-chip overhead, and all cores collaborate instead of being rented out per core. Fourth, ultimate energy efficiency: deploying as many CPUs as possible without encroaching on GPU token power generation.
The Vera CPU also achieved multiple industry firsts: the world's first CPU supporting PCIe Gen 6, the first server processor with LPDDR5 memory and achieving a bandwidth of 1.2 TB/s (2 to 3 times the performance of the current highest-performance x86 CPU). Huang Renxun stated that the CPU industry achieving a 5% improvement is already remarkable, 10% is rare, but the performance gap of the Vera CPU compared to the strongest x86 CPUs is "unprecedented."
He also revealed that NVIDIA has already sold millions of Grace CPUs (Grace Blackwell series) and has become one of the world's largest CPU manufacturers. The Vera CPU plays a triple role in the system: orchestrating GPUs and managing KV caches in the Vera Rubin NVL72 rack; running the model orchestration, tool calls, and database access in the harness layer for Agents; and driving the world's fastest AI storage server in the Vera BlueField storage system.
