According to Perceive Beating monitoring, NVIDIA officially open-sourced the 550 billion-parameter Nemotron 3 Ultra flagship language model on June 4, with 550 billion activations, optimized for long-range intelligent tasks such as complex planning, reasoning, and tool invocation.
In the third-party benchmark platform Artificial Analysis's Intelligence Index, Nemotron 3 Ultra scored 47.7 points, making it the most powerful open-source weight model in the United States, but still lagging behind multiple Chinese open-source models such as Kimi K2.6 (53.9 points), MiMo-V2.5-Pro (53.8 points), and DeepSeek V4 Pro (51.5 points) in the global open-source model arena.
In terms of technical architecture, the model adopts the Mamba-Transformer hybrid expert architecture MoE, mitigating the memory bottleneck of quadratic growth of the KV cache under ultra-long contexts. By alternately using the Mamba-2 state-space model layer and the Transformer self-attention layer, the model supports a context window of 1 million tokens with extremely low memory overhead. The hybrid architecture achieved a maximum 5x throughput improvement and 30% reduction in inference cost for intelligent tasks.
For ecosystem support, NVIDIA simultaneously released the Agent Toolkit, including the NemoClaw orchestration blueprint and OpenShell runtime. The open-source content directly includes model weights, datasets, and training recipes. The model has been deployed on Hugging Face, NVIDIA NIM, and OpenRouter, with enterprise AI search service provider Glean announcing integration, positioning it as an alternative to commercial closed-source large models.
