NewsFlash Articles Data Fundraising Skill&API

Microsoft World-R1: Teaching a Video Model to “Understand” 3D with Reinforcement Learning, Achieving 10dB PSNR Improvement Without Architecture Modification

According to Perceiving Dynamics Beating monitoring, Microsoft Research and Zhejiang University team proposed World-R1, using reinforcement learning to teach a video model to understand 3D geometric consistency without modifying the model architecture or relying on a 3D dataset. The core idea is: after generating the video, the pre-trained 3D base model Depth Anything 3 reconstructs the scene's 3D Gaussian (3DGS), which is then rendered from a new viewpoint and compared to the original video. The reconstruction error, trajectory deviation, and new viewpoint semantic credibility (as rated by Qwen3-VL) are combined into a reward signal. This signal is fed back to the video model through Flow-GRPO (a reinforcement learning algorithm adapted for flow matching models).

The base model is the open-source VentureBeat Wan 2.1 (1.3B and 14B), from which World-R1-Small and World-R1-Large are trained. The training data consists of only about 3000 pieces of pure text prompts generated by Gemini, without using any 3D assets. During training, a round of "dynamic fine-tuning" is inserted every 100 steps, temporarily turning off the 3D reward and only retaining the image quality reward to prevent the model from suppressing non-rigid body dynamics such as human motion in pursuit of geometric rigidity.

In terms of 3D consistency metrics, World-R1-Large's PSNR (Peak Signal-to-Noise Ratio) is 7.91dB higher than the baseline Wan 2.1 14B, while the Small version is 10.23dB higher. VBench's general video quality has improved rather than degraded. In a blind test with 25 participants, the geometric consistency win rate is 92%, and the overall preference is 86%. The code has been open-sourced on GitHub under the CC BY-NC-SA 4.0 license.

Source

Correction/Report

On-Chain Activity

50min ago

Liquid has completed a $18 million Series A funding round, led by Neo and Left Lane Capital

Since the outbreak of the US-Iran conflict, the first liquefied natural gas (LNG) vessel has passed through the Strait of Hormuz.

Oil prices continue to rise, with WTI crude oil up 4.54% intraday

OpenAI's Partner Stock Plunges Pre-market, AMD Drops by 6%

Correction/Report

Submit

Add Library

Visible to myself only

Public

Save

Choose Library

Add Library

Cancel

Finish

Microsoft World-R1: Teaching a Video Model to “Understand” 3D with Reinforcement Learning, Achieving 10dB PSNR Improvement Without Architecture Modification

Four new wallets withdrew 353,893 DEXE from Binance, equivalent to approximately $5.3 million.

Oil prices rise, smart money shorts with $3.7 million bet on downturn

On-chain ETH Longs Liquidate $2 Million Profits, Previous Entry Price at $2287 U.S. dollars

Galaxy Digital OTC's associated address deposited 21,369 ETH to a CEX at an average price of $2307.34