header-langage
简体中文
繁體中文
English
Tiếng Việt
한국어
日本語
ภาษาไทย
Türkçe
Scan to Download the APP

Meituan Open Sources 560B Parameter Theorem Proving Model: 72 Inference Steps with 97.1% Success Rate, Refreshing SOTA Open Source Model

According to 1M AI News monitoring, the Meituan LongCat team has open-sourced LongCat-Flash-Prover, a 560B-parameter MoE model specializing in mathematical reasoning tasks for the Lean4 formal theorem proving language. The model weights are released under the MIT license and are now available on GitHub, Hugging Face, and ModelScope.

The model decomposes formal reasoning into three distinct capabilities: Automatic Formalization (transforming natural language math problems into Lean4 formal statements), Sketch Generation (producing lemma-style proof outlines), and Full Proof Generation. All three capabilities are integrated through the Agent tool for Trustable Integration Reasoning (TIR) and real-time interaction verification with the Lean4 compiler.


On the training side, the team proposes a Hybrid-Experts Iteration Framework to generate cold-start data and introduces the HisPO algorithm during the reinforcement learning phase to stabilize the MoE model for long-range task training, while incorporating theorem consistency and validity checking mechanisms to prevent reward hacking.

Benchmark tests demonstrate that LongCat-Flash-Prover has set new benchmarks in automatic formalization and theorem proving among open-source weight models. It achieves a 97.1% pass rate with only 72 inferences on the MiniF2F-Test, 70.8% on ProverBench, and 41.5% on PutnamBench, with at most 220 inferences per question.

举报 Correction/Report
Correction/Report
Submit
Add Library
Visible to myself only
Public
Save
Choose Library
Add Library
Cancel
Finish