header-langage
简体中文
繁體中文
English
Tiếng Việt
한국어
日本語
ภาษาไทย
Türkçe
Scan to Download the APP

The world's first AI-authored pre-training framework has been open-sourced. Tsinghua University and Wall of Faces have jointly launched ForgeTrain.

According to Perceptual Beating monitoring, MindWall AI and Tsinghua NLP Lab jointly open-sourced the world's first AI-written production-level large-scale model pre-training framework, ForgeTrain, in the OpenBMB community. They also released the MiniCPM5-1B edge-side small model trained by ForgeTrain. As the first demonstration of an "AI creating AI" engineering closed loop, ForgeTrain outperformed Nvidia's Megatron under the same hardware conditions and achieved a 10% acceleration in pre-training on Huawei Ascend. At the same time, MiniCPM5-1B topped the Artificial Analysis open-weight small model leaderboard.

To enable AI to autonomously build underlying pre-training infrastructure, MindWall AI proposed the "Forge Engineering" software programming paradigm, abandoning a universal framework compatible with all hardware and tasks, and instead using AI's low-cost code generation capability to forge dedicated code for specific models and hardware on-site. In terms of construction mechanism, ForgeTrain adopts a three-stage approach: first, it collects key data from existing pre-training frameworks to form a test harness, then iteratively generates binary-consistent framework code in an automatic closed loop, and finally removes limitations to achieve surpassing the reference implementation. The entire automation evolution corresponds to the L3 to L4 stages of AI creating AI.

As the first output model of ForgeTrain, MiniCPM5-1B has 1.08 billion parameters, with its core architecture based on the standard LlamaForCausalLM design, significantly reducing the threshold for downstream integration and inference deployment. In the Artificial Analysis evaluation, the model scored 18 points, surpassing the 2B-scale Qwen3.5-2B (16 points) and leading Qwen3.5-0.8B (11 points) and LFM2.5-1.2B-Thinking (8 points). The model supports deployment formats such as MLX 4-bit and GGUF Q4_K_M, with weight after INT4 quantization only 0.5GB, and natively supports 131,072 token long-text context and enable_thinking-based hybrid dual-mode reasoning. Built on extremely low hardware overhead, OpenBMB also open-sourced the MiniCPM Desk Pet desktop widget companion app for purely offline operation, supporting real-time response to coding activities in development tools such as Cursor and LoRA persona switching.

举报 Correction/Report
Correction/Report
Submit
Add Library
Visible to myself only
Public
Save
Choose Library
Add Library
Cancel
Finish