Multimodal AI News

Read the latest multimodal AI news, analysis, and industry updates on AI Wire Media.

OpenAI Launches GPT Image 2 for ChatGPT with Thinking Mode, Web Search, and Multi-Image Consistency

OpenAI equips its image generator, ChatGPT Images 2.0, with reasoning capabilities and web search. The generator can now create up to eight consistent images from a single prompt and handles text in non-Latin scripts much better.
Meta Launches Muse Spark, Its First Closed-Weight Frontier AI Model

Meta Superintelligence Labs has unveiled Muse Spark, the first model in its new Muse family and the company’s first AI system not released with open weights. According to early independent testing, the model significantly narrows Meta’s gap with OpenAI, Anthropic, and Google in the
...
Alibaba launches Qwen3.6-Plus with 1M token context and advanced AI agents

Alibaba’s cloud division has launched Qwen3.6-Plus, a new model offering a 1 million-token context window by default, enhanced agent capabilities for programming, and more accurate multimodal perception and ...
Alibaba Unveils Qwen3.5-Omni, Omnimodal AI Model That Challenges Gemini 3.1 Pro

Alibaba has unveiled Qwen3.5-Omni, an omnimodal AI model with text, image, audio, and video understanding. It is claimed to outperform Gemini 3.1 Pro on audio tasks and introduces a new capability: coding from spoken instructions and video input.
Meta and NYU Study Shows Multimodal AI Can Be Trained From Scratch

Meta FAIR and researchers from New York University have conducted a systematic study on how multimodal AI models can be trained from scratch. Their findings challenge several widely held assumptions in the field.
Kling AI 3.0: Kuaishou lanza su nuevo modelo de video multimodal con clips de 15 segundos

El desarrollador chino Kuaishou ha presentado la tercera versión de su modelo de generación de video, Kling AI.
Kuaishou Releases Kling AI 3.0 With Advanced Multimodal Video Generation

Chinese developer Kuaishou has unveiled the third version of its video generation model, Kling AI.
DeepSeek Introduces Semantic Vision Encoder for AI OCR

The Chinese AI company DeepSeek has unveiled a novel vision encoder that semantically reorders image information instead of processing it rigidly from top left to bottom right.
Study Finds AI Models Fail Visual Tasks Solved by Toddlers

A new study reveals a fundamental weakness in current AI systems: even the most powerful multimodal language models fail at basic visual tasks that toddlers solve effortlessly.
Google Releases Gemini 3 Flash, Makes It Default Model in Gemini App and Search

Google has released its new language model, Gemini 3 Flash, and made it the default option in the Gemini app as well as in AI Mode within Google Search.