Microsoft’s new superintelligence team led by Mustafa Suleyman has introduced MAI-Image-2, a text-to-image AI model. According to Microsoft, the model ranks third on the Arena.ai leaderboard for text-to-image generators. Ahead of it, by a fairly clear margin, are OpenAI’s GPT-Image-1.5 and Google’s Nano Banana 2.
MAI-Image-2 is designed to generate especially realistic photos, with natural lighting and accurate skin tones. It is also intended to handle highly detailed and surreal scenes. Microsoft says the model was developed in collaboration with photographers, designers, and visual artists.
Three images generated by MAI-Image-2 are shown side by side: a portrait with shadows across the face, a macro shot of an iris, and a person standing inside a blue glacier cave.
The model is also said to perform well on less artistic tasks, such as reliably rendering text inside images for posters, infographics, or charts.
Three poster-style images generated by MAI-Image-2 are shown side by side: a modernist poster with a red circle, a café menu with an orange illustration, and an equestrian event poster with a jumping horse.
MAI-Image-2 is now available for testing in the MAI Playground, depending on region, and is also being rolled out in Copilot and Bing Image Creator. API access is available to selected business customers and is expected to open to all developers soon via Microsoft Foundry. Microsoft has not disclosed technical details, pricing, or information about the training data.
Microsoft first introduced its own image generation model, MAI-Image-1, in October 2025. At that time, it ranked only ninth among text-to-image models in the AI Arena and did not play a major role. With MAI-Image-2 now reaching third place, that appears to have changed, although Microsoft still does not seem able to challenge the top models from OpenAI and Google.
ES
EN