In benchmark tests, the model outperformed OpenAI’s GPT-5.2 and Anthropic’s Claude Opus 4.6, including on ARC-AGI-2 (visual reasoning puzzles), MMMU-Pro for multimodal evaluation, an Elo score of 3,455, and Humanity’s Last Exam.

Source: Google.
Source: Google.

“We upgraded Gemini 3 Deep Think in close collaboration with scientists and researchers to tackle complex scientific challenges—where problems often lack clear boundaries or a single correct answer, and the data is incomplete,” Google said in a blog post.

Gemini 3 Deep Think delivers state-of-the-art results in mathematics and programming and performs “exceptionally well” in the natural sciences, including chemistry and physics. The upgraded mode solves problems at the level of gold medalists in international Olympiads.

On the CMT Benchmark, the model scored 50.5%, demonstrating deep expertise in theoretical physics.

Source: Google.
Source: Google.
“Beyond benchmark performance, Deep Think is designed for real-world use: it helps researchers interpret complex data and enables engineers to model physical systems through code,” Google noted.

The updated Deep Think is available in the Gemini app for Google AI Ultra subscribers and via the Gemini API for selected developers.

DeepMind’s AI Mathematician

Google DeepMind also introduced Aletheia, an AI agent that set a new record on IMO-ProofBench Advanced, solving 91.9% of the problems. The benchmark is considered one of the most challenging in mathematics.

Built on Gemini Deep Think, Aletheia includes a verification module that detects errors in draft solutions and iteratively refines them. A key feature of the agent is its ability to recognize when a problem is unsolvable, significantly saving researchers’ time.

Aletheia uses Google Search to navigate complex scientific literature, reducing the risk of false references and computational errors.

Among the system’s achievements:

  • full generation of a scientific paper with computed structural constants in arithmetic geometry;

  • human-AI collaboration on proofs for interacting particle systems (independent sets);

  • autonomous solutions to four problems from the Erdős problem list, including one previously considered open.

DeepMind emphasized that Aletheia’s success reinforces scaling laws: in formal mathematics, solution quality continues to improve through effective agent-based approaches.

Breakthrough in Drug Discovery

DeepMind subsidiary Isomorphic Labs unveiled IsoDDE, a new engine for drug discovery. In complex evaluations, IsoDDE achieved twice the predictive accuracy of AlphaFold 3.

While AlphaFold 3 marked a major breakthrough by predicting 3D protein structures and molecular interactions, IsoDDE advances the field further:

  • it predicts binding affinity more accurately than traditional methods;

  • it can identify hidden protein binding pockets for drug targeting;

  • it supports a wide range of complex molecules, including antibodies and large biological structures.

“IsoDDE provides a scalable foundation for AI-driven drug design, delivering the level of predictive accuracy required to work with novel biological systems with unprecedented reliability,” the company said.