According to 1M AI News, OpenClaw founder Peter Steinberger retweeted benchmark results from third-party PinchBench, which evaluated the performance of AI large language models in the OpenClaw agent task.
The results showed that Gemini 3 Flash led the OpenClaw task success rate at 95.1%, with minimax-m2.1 and kimi-k2.5 following at 93.6% and 93.4% respectively. Claude Sonnet 4.5 scored 92.7%, while GPT-4o scored 85.2%.
