According to 1M AI News monitoring, the Meituan LongCat team has open-sourced LongCat-Flash-Prover, a 560B-parameter MoE model specializing in mathematical reasoning tasks for the Lean4 formal theorem proving language. The model weights are released under the MIT license and are now available on GitHub, Hugging Face, and ModelScope.
The model decomposes formal reasoning into three distinct capabilities: Automatic Formalization (transforming natural language math problems into Lean4 formal statements), Sketch Generation (producing lemma-style proof outlines), and Full Proof Generation. All three capabilities are integrated through the Agent tool for Trustable Integration Reasoning (TIR) and real-time interaction verification with the Lean4 compiler.
On the training side, the team proposes a Hybrid-Experts Iteration Framework to generate cold-start data and introduces the HisPO algorithm during the reinforcement learning phase to stabilize the MoE model for long-range task training, while incorporating theorem consistency and validity checking mechanisms to prevent reward hacking.
Benchmark tests demonstrate that LongCat-Flash-Prover has set new benchmarks in automatic formalization and theorem proving among open-source weight models. It achieves a 97.1% pass rate with only 72 inferences on the MiniF2F-Test, 70.8% on ProverBench, and 41.5% on PutnamBench, with at most 220 inferences per question.
