Google DeepMind Adds Agentic Vision to Gemini 3 Flash for Active Image Analysis

Details: By Chris Borden; Category: Models; 5 m; 28 January 2026; 332

Google DeepMind equips its Gemini 3 Flash model with a new capability called “Agentic Vision.” The model is designed to actively investigate images rather than merely observe them passively — although the feature does not yet work fully automatically in all cases.

Traditional AI models process images in a single pass. If they miss a detail, they can only guess. Google DeepMind aims to change this with Agentic Vision. The model can iteratively zoom, crop, and manipulate images by generating and executing Python code.

The system operates through a so-called Think–Act–Observe loop. First, the model analyzes the request and the image and formulates a plan. It then generates and runs Python code — for example, to crop, rotate, or annotate images. The output is added to the context window, allowing the model to inspect the new data before producing a response. Google reports that code execution improves performance by 5–10% across multiple vision benchmarks.

Google states that code execution improves benchmark results by 5–10%. | Image: Google DeepMind

This idea is not entirely new: OpenAI previously introduced similar capabilities with its o3 model.

Construction blueprint startup reports improvements

As a real-world example, Google cites PlanCheckSolver.com, a platform that checks architectural blueprints for regulatory compliance. The startup reports a 5% accuracy improvement by allowing Gemini 3 Flash to iteratively inspect high-resolution plans. The model crops areas such as roof edges or building sections and analyzes them individually.

For image annotation, the model can also draw bounding boxes and labels directly onto images. Google demonstrates this with a finger-counting example: the model marks each finger with a box and number to prevent counting errors.

For visual mathematics, the model can parse tables and perform calculations inside a Python environment rather than hallucinating. The result can be output as a chart.

Many functions still require explicit instructions

Google acknowledges that these capabilities do not yet operate fully automatically. While the model can implicitly zoom in on small details, other functions — such as rotating images or performing visual math — still require explicit prompt instructions. The company plans to eliminate these limitations in future updates.

Additionally, Agentic Vision is currently available only for the Flash model. Expansion to other model sizes is planned, along with additional tools such as web search and reverse image search.

Agentic Vision is available via the Gemini API in Google AI Studio and Vertex AI. Rollout in the Gemini app has begun — users can enable it by selecting “Thinking” in the model dropdown. A demo app and developer documentation are also available.

Conclusion:

Agentic Vision represents a major step toward more autonomous and precise visual reasoning in AI systems, enabling models to actively explore images instead of relying on a single static pass.

About The Hosts

Chris Borden

AI Analyst & Technology Researcher

AI researcher and industry analyst covering decentralized infrastructure, AI systems, and emerging technology markets. Focused on data-driven analysis, long-term trends, and real-world adoption of artificial intelligence.

AI News

Accenture Tracks AI Tool Usage and Ties Adoption to Promotions

Adobe Firefly Introduces Unlimited AI Image and Video Generation for Subscribers

Adobe Unveils CX Enterprise AI Agent Platform as It Searches for a New CEO

AGI May Arrive by 2026–2027, Warns Anthropic CEO Dario Amodei

AI & Society

AI Agents Create a Lobster Religion on Moltbook

AI Boom Drives Cybersecurity Hiring Despite Tech Sector Layoffs

AI Could Trigger a Major U.S. Economic Crisis by 2028, Citrini Research Warns

AI Is Increasing Workload Instead of Reducing It, ActivTrak Study Finds

AI Insights

Adobe Reinvents Document Work with Acrobat Studio and AI

AI agents could disrupt ads and reshape internet commerce

AI as a Role Model for Generation Alpha: Promise, Risks, and the Future of Childhood

AI as a Toy: Why Humanity Always Misuses New Technology First

Google DeepMind Adds Agentic Vision to Gemini 3 Flash for Active Image Analysis

Construction blueprint startup reports improvements

Many functions still require explicit instructions

About The Hosts

More From Chris Borden

Industry

China’s AI Market Shifts From Model Race to Monetization and Enterprise Use

Policy & Security

Meta Discloses Instagram AI Support Chatbot Hack Affecting Over 20,000 Accounts

Robotics

Nvidia Introduces Isaac GR00T Reference Humanoid Robot for AI Robotics Research

Platforms

OpenAI Expands Codex With Plugins, Sites, and New Productivity Features

Platforms

DuckDuckGo Installs Up 30% After Google Forces AI on Searchers

Work

Paul Graham Warns AI-Written Emails Can Damage Trust

Work

AI Boom Drives Cybersecurity Hiring Despite Tech Sector Layoffs

Opinion / Interviews

LeCun Denies AI Intelligence While Hassabis Says Singularity Is Here

Models

Qwen3.7-Max: Alibaba's Agentic AI Runs 35 Hours Autonomously, Achieves 10x Kernel Speedup and Rivals Claude Opus 4.6

Models

Google Gemini 3.5 Flash Launches: Agentic AI That Codes, Manages Projects, and Builds Operating Systems

Categories

AI News

Categories

AI & Society

Categories

AI Insights

Google DeepMind Adds Agentic Vision to Gemini 3 Flash for Active Image Analysis

Construction blueprint startup reports improvements

Many functions still require explicit instructions

About The Hosts

More From Chris Borden