According to Observation Beating monitoring, the dark side of the moon has open-sourced its next-generation flagship Kimi K2.6, which is now live on the Kimi.com website, Kimi App, the open platform API, and the in-house programming tool Kimi Code. Earlier, K2.6 had only been available for internal testing within Kimi Code under the name code-preview for Beta users for a month. Today marks the first public release of the full model and the opening of the API.
In the official benchmark, K2.6 has surpassed the current top closed-source flagship in various programming and agent tasks: SWE-Bench Pro 58.6 (GPT-5.4 xhigh 57.7, Claude Opus 4.6 max effort 53.4, Gemini 3.1 Pro 54.2); HLE Full Suite with Tools 54.0, all three closed-source options fall below this number; DeepSearchQA f1 92.5, with GPT-5.4 scoring only 78.6 in the same category. Terminal-Bench 2.0 scored 66.7, second only to Gemini 3.1 Pro at 68.5; SWE-Bench Verified 80.2, comparable to Opus 4.6 at 80.8 and Gemini 3.1 Pro at 80.6. The open-source camp has historically had almost no option to compete with closed-source cutting-edge flagship models in this class of programming benchmarks.
The official blog also provided two sets of long-running performance test data. One involved having K2.6 locally on a Mac using Zig (an obscure low-level programming language) to rewrite Qwen3.5-0.8B inference. After over 4000 tool calls, 12 hours of continuous operation, and 14 rounds of iteration, throughput increased from approximately 15 tokens/sec to 193 tokens/sec, about 20% faster than LM Studio. The second test involved taking over an 8-year-old open-source matching engine, exchange-core, completing a 13-hour run, over a thousand tool calls, modifying over 4000 lines of code, and reconfiguring the core thread topology (changing from 4ME+2RE to 2ME+1RE), resulting in a 185% throughput improvement. Both these numbers are self-tested by the official team and have not yet been independently verified.
The concurrently upgraded Agent Swarm of K2.6 can now run 300 sub-agents simultaneously, with a maximum of 4000 steps, whereas the predecessor K2.5 had a limit of 100 agents and 1500 steps. The moon's dark side's in-house RL infrastructure team has already run a 5-day autonomous on-call operational agent using K2.6, and the official team has released an excerpt of this work log.
