According to Dynamic Beating monitoring, Google DeepMind has released an AI co-mathematician, a multi-agent interactive research platform for mathematicians. The system achieved a 47.9% accuracy rate (solving 23 out of 48 problems) on the most challenging research-level math benchmark, FrontierMath Tier 4, surpassing the previous highest record of 39.6% held by GPT-5.5 Pro.
This system did not use a next-generation base but directly utilized Gemini 3.1 Pro. The model performed at only 19% on its own at Tier 4, but after incorporating the Agent framework, its performance more than doubled. DeepMind designed a multi-layered architecture for it: at the top level, a "Project Coordinator" divides research tasks into multiple workflows, which are then distributed to sub-agents responsible for literature retrieval, coding, and reasoning. The proofs written must also undergo a review process by a panel of multiple "Review Agent" before submission. This sophisticated scaffolding demonstrates that the ability to orchestrate incremental gains in top-tier mathematical reasoning may be greater than model iteration.
The blind test was conducted by Epoch AI, and to prevent cheating, the DeepMind team had no visibility of the questions, with each question allowed a runtime of 48 hours. Not only did the system claim the top spot, but it also solved three questions that had stumped all previous models.
Although named a co-mathematician, it behaves more like a brainstorming colleague. Group theory expert Marc Lackenby used it in actual research to resolve a public conjecture from the Kourovka Notebook. Interestingly, the strategy initially proposed by the system was flagged as "flawed" by its own review Agent. However, Lackenby identified a clever insight hidden in the discarded approach, filled in the gap himself, and ultimately completed the proof.
Currently, the AI co-mathematician is only available for beta testing to a select group of mathematicians.
