Category: Explainers
Alex Rowland
Share
Listen On

AI startup Inception has launched what it calls the first diffusion-based reasoning model. The new model, Mercury 2, does not generate text sequentially, word by word, like conventional language models. Instead, it refines multiple text segments in parallel — an approach the company compares to an editor revising an entire draft at once rather than fixing individual words

According to Inception, this architecture makes Mercury 2 more than five times faster than traditional models. It reportedly reaches 1,009 tokens per second on Nvidia Blackwell GPUs, with an end-to-end latency of just 1.7 seconds. By comparison, Gemini 3 Flash takes 14.4 seconds, while Claude Haiku 4.5 with reasoning enabled reaches 23.4 seconds. Inception claims output quality comparable to leading speed-optimized models.

Pricing is also positioned aggressively. Mercury 2 costs $0.25 per million input tokens and $0.75 per million output tokens, making it half the price of Gemini 3 Flash on inputs and four times cheaper on outputs. Compared with Claude Haiku 4.5, Mercury 2 is roughly four times cheaper on input tokens and more than two and a half times cheaper on output.

The model targets latency-sensitive enterprise use cases such as voice assistants, coding tools, and search systems. Mercury 2 supports a 128K context window, tool use, and structured JSON output, and is available via an OpenAI-compatible API. Companies can apply for early access or test the model directly in chat.

Inception raised $50 million in funding last November from investors including Microsoft, Nvidia, and Snowflake. While Google DeepMind has also experimented with diffusion-based language models, interest in transformer alternatives remains early-stage. Whether diffusion-based text generation can challenge the dominance of transformers long term is still an open question.

AI Industry Analyst
Is an AI industry analyst covering major AI platforms, enterprise adoption, and strategic moves by Big Tech companies. His work focuses on how AI systems are deployed at scale and how they reshape products, markets, and user behavior.

Recent Podcasts

Adobe Reinvents Document Work with Acrobat Studio and AI

Adobe has fundamentally reimagined document workflows with the launch of Acrobat Studio, a...

AI as a Role Model for Generation Alpha: Promise, Risks, and the Future of Childhood

Artificial intelligence is becoming the main role model for Generation Alpha. 2026 may mark a...

AI as a Toy: Why Humanity Always Misuses New Technology First

Artificial intelligence could, in theory, help solve all of humanity’s problems. Stop wars, cure...