According to Data Beat monitoring, Microsoft's MAI Superintelligence Team has released MAI-Image-2-Efficient, a production-optimized text-to-image model. Compared to the previous flagship version, MAI-Image-2, the new model is 22% faster in generation speed, with a 4x increase in single-card throughput, and an API pricing reduction of approximately 41%: \$5 per million tokens for text input, and \$19.5 per million tokens for image output.
Speed is the core selling point of this model. Microsoft provided the following median latency comparison: MAI-Image-2-Efficient at 13.7 seconds, MAI-Image-2 at 17.5 seconds, Google Gemini 3 Pro Image at 19.1 seconds, and GPT-Image-1.5-High at 41.4 seconds. Microsoft claims it is on average about 40% faster than other mainstream text-to-image models.
Microsoft positions the two models as complementary: the Efficient version is suitable for scenarios requiring batch and real-time image generation, such as product images, marketing materials, and UI prototypes, while the flagship version is used for scenarios with the highest detail requirements, such as portraits, realistic scenes, and complex in-picture text. MAI-Image-2-Efficient has been launched on Microsoft Foundry and MAI Playground, and is being rolled out to Copilot and Bing, with a PowerPoint version to follow.
