NewsFlash Articles Data Fundraising Skill&API

Online Policy Distillation with Dreaming Simulation for Scalable End-to-End Learning of Novel Solutions

According to Dynam.AI Beating monitoring, large language models commonly face the challenge of being unable to sustainably absorb new knowledge after deployment. Current optimization techniques mainly focus on expanding the context window and improving search speed, which only allows the model to temporarily look up information within a single conversation. Once the dialogue ends, the knowledge is entirely forgotten. The real bottleneck for continuous learning of large models lies not in these search speed optimizations, but in how to physically rewrite the experiences learned in dialogues into the underlying weight parameters of the large model.

Online Policy Self-Distillation (OPSD) provides a new weight updating path. When a large model faces a task, its "teacher state" with a complete long-context generates high-quality answers. Subsequently, the system calculates dense supervision signals in the cloud through backpropagation, by computing the probability difference at the token level between the base state (student) and the teacher state, allowing the base model to approximate that smart state that scored high.

Compared to the supervised fine-tuning (SFT) that forcefully makes the model memorize all dialogue texts, self-distillation only extracts decision-making experiences necessary to maintain performance. This extremely sparse parameter update can prevent Catastrophic Forgetting, preserving the large model's original common sense from being overwritten.

Another more forward-looking learning path is Dreaming Simulation. When facing complex tasks, the large model consumes significant inference period computational power to self-play scenarios in its mind. Based on observed daily patterns, the model automatically constructs a virtual simulator environment and conducts tens of thousands of task rehearsals within the simulator environment. If the rehearsals are successful, the system records the successful trajectories as teaching materials and updates the base model's underlying weights. Compared to lightweight compression that only generates short summaries, Dreaming Simulation consumes massive cloud-based computation to repeatedly pre-enact, representing the fourth dimension of expansion for large models.

It is projected that from 2027 to 2028, AI agents will undergo work evaluations after collaborating with humans for one week. Once accredited, the system can distill the accumulated practical experience of the week through Online Policy Self-Distillation (OPSD) or Dreaming Simulation in the cloud into the model's underlying weights, achieving online expansion of capabilities post-deployment, enabling the large model to get smarter with increased usage.

Source

Correction/Report

On-Chain Activity

3h ago

A whale buys $5.455 million worth of SK Hynix in a single transaction on Binance, causing the contract price to briefly surge to $1830.

24H Important News

2026-06-29

Predict.fun World Cup Knockout Stage First Match "Canada vs. South Africa": Canada's win probability is 58%

Predict.fun World Cup Top 32 Knockout Stage Event Launched, Remaining Prize Pool Exceeds $1.1 Million

South Korean investors' stock trading on margin hits a record high, exacerbating stock market volatility

Hyper Foundation to Gradually Phase Out $10 Million Grant for USDH Stablecoin

Correction/Report

Submit

Add Library

Visible to myself only

Public

Save

Choose Library

Add Library

Cancel

Finish

Online Policy Distillation with Dreaming Simulation for Scalable End-to-End Learning of Novel Solutions

「Binance Coin Saga」: Price Completes "V-Shaped" Recovery, Surging 50% from the Low

Meme Coin ANSEM Tops the Charts with a Whopping $55.5 Million Unrealized Gain, Controlled by Crypto KOL Ansem

Crypto KOL Ansem continues to shill the namesake Meme coin, with ANSEM's market cap briefly surpassing $78 million.

A whale buys $5.455 million worth of SK Hynix in a single transaction on Binance, causing the contract price to briefly surge to $1830.