BlockBeats News, April 30th, DeepSeek today released a new model named DeepSeek-Prover-V2-671B on the AI open-source community Hugging Face. It is reported that DeepSeek-Prover-V2-671B uses a more efficient safetensors file format and supports multiple calculation precisions, making it easier to train and deploy the model faster and more resource-efficiently. With 671 billion parameters, it is either an upgrade to last year's Prover-V1.5 mathematical model. In terms of model architecture, the model uses the DeepSeek-V3 architecture, adopts a Mixture of Experts (MoE) mode, has 61 Transformer layers, and a 7168-dimensional hidden layer. It also supports ultra-long contexts, with a maximum position embedding of 163,800, allowing it to handle complex mathematical proofs. Furthermore, it employs FP8 quantization to reduce the model size through quantization techniques, improving inference efficiency. (Jinshi)
