header-langage
简体中文
繁體中文
English
Tiếng Việt
한국어
日本語
ภาษาไทย
Türkçe
Scan to Download the APP

$1500 Bootstraps 1B Base Model from Scratch! Sapient Open-Sources Hierarchical Inference Architecture HRM-Text

According to Perfekto Beating's monitoring, Sapient Intelligence has open-sourced a 1-billion-parameter text generation base model named HRM-Text. This is a purely pre-trained model based on the Hierarchical Reasoning Model (HRM) architecture. By introducing latent space reasoning at the architecture's core, the computational cost of pre-training the base model has been reduced by 130 to 600 times.

In particular, HRM-Text achieved pre-training using only 400 billion (40B) structured Tokens, which is approximately one-thousandth of the data volume of similar conventional models. Official tests have shown that training the 1B version from scratch takes about 46 hours using two 8-GPU H100 servers, with a cost of around $1472; whereas the 0.6B version only requires a single-node run for 50 hours, with hardware costs of around $800. The full engineering framework, including data extraction, sequence packaging, and PyTorch distributed training, has been open-sourced simultaneously.

The foundation of cost reduction lies in the unique Dual-timescale recurrent design. The model incorporates two sets of Transformer modules: fast (lower-level) and slow (higher-level). These two sets of modules iterate alternately on the same batch of inputs and exchange information through state summation. This design allows the model to dynamically expand its computational depth by increasing the number of iterations while keeping the total physical parameter count fixed.

The cliff-edge drop in pre-training threshold has allowed many model theories that were previously shelved due to expensive computational requirements to once again undergo low-cost validation. It is important to note that only unaligned pure pre-training weights have been released this time, and the model can only perform prefix continuation tasks and cannot be directly used as a question-answering assistant.

举报 Correction/Report
Correction/Report
Submit
Add Library
Visible to myself only
Public
Save
Choose Library
Add Library
Cancel
Finish