According to Dongcha Beating monitoring, NVIDIA has open-sourced the Cosmos-Reason2-32B model weights. Cosmos Reason 2 is NVIDIA's AI inference vision language model released at the end of last year (a model that processes images, videos, and text simultaneously), specifically designed to teach robots and autonomous driving systems to understand spatial, temporal, and fundamental physical laws. Initially, only two smaller versions of the weights with 20 billion and 80 billion parameters were made available, and the flagship version with 320 billion parameters has now been publicly released for the first time. The base is the Qwen3-VL-32B-Instruct universal query model, and it is available for commercial use under the NVIDIA Open Model License.
Give it a driving video, and it can infer in real-time whether a right turn is safe; give it a warehouse photo, and it can annotate the 2D/3D coordinates and bounding boxes of each item. Its main applications are in three areas: analyzing video streams of urban and industrial scenes, bulk labeling sensor data, and serving as the planning brain for humanoid robots and autonomous vehicles. Compared to the previous generation, new features include object detection with timestamped precise localization, with the context window expanded to 256K tokens.
