OpenZeppelin Flags Data Flaws in OpenAI’s EVMbench Smart Contract AI Benchmark

Details: By Alex Rowland; Category: Policy & Security; 4 m; 03 March 2026; 169

Cybersecurity firm OpenZeppelin has audited OpenAI’s new AI benchmark, EVMbench, and identified methodological flaws as well as data contamination issues.

OpenAI launched EVMbench in mid-February in partnership with investment firm Paradigm to evaluate how well AI agents can find, fix, and exploit vulnerabilities in smart contracts.

OpenZeppelin welcomed the initiative, but decided to review it using the same standards applied to the protocols it helps secure, including Aave, Lido, and Uniswap.

Key shortcomings

The main issue concerns training data contamination. EVMbench is built on a set of 120 vulnerabilities identified during audits conducted in 2024 and 2025.

However, the leading models tested on the benchmark have knowledge cutoffs up to August 2025. That means the models could potentially “remember” information about those vulnerabilities from their training data. Even with internet access disabled, this casts doubt on the validity of the experiment, since it is unclear whether the AI can actually detect genuinely new threats.

OpenZeppelin also pointed to factual errors in the EVMbench dataset. At least four of the vulnerabilities classified as “high risk” turned out to be non-exploitable. Despite that, AI agents still received full credit for supposedly identifying them correctly.

“These are not subjective disagreements about severity; these are cases where the described attack simply does not work,” the experts said.

OpenZeppelin acknowledged that AI will play a major role in the future of blockchain security. At the same time, the firm warned that speed of adoption should not come at the expense of data quality and testing standards.

“The question is not whether AI will transform smart contract security — it will. The question is whether the benchmarks and datasets we use to build these tools will be held to the same standards as the contracts they are meant to protect,” OpenZeppelin concluded.

About The Hosts

Alex Rowland

AI Industry Analyst

Is an AI industry analyst covering major AI platforms, enterprise adoption, and strategic moves by Big Tech companies. His work focuses on how AI systems are deployed at scale and how they reshape products, markets, and user behavior.

AI News

Accenture Tracks AI Tool Usage and Ties Adoption to Promotions

Adobe Firefly Introduces Unlimited AI Image and Video Generation for Subscribers

Adobe Unveils CX Enterprise AI Agent Platform as It Searches for a New CEO

AGI May Arrive by 2026–2027, Warns Anthropic CEO Dario Amodei

AI & Society

AI Agents Create a Lobster Religion on Moltbook

AI Boom Drives Cybersecurity Hiring Despite Tech Sector Layoffs

AI Could Trigger a Major U.S. Economic Crisis by 2028, Citrini Research Warns

AI Is Increasing Workload Instead of Reducing It, ActivTrak Study Finds

AI Insights

Adobe Reinvents Document Work with Acrobat Studio and AI

AI agents could disrupt ads and reshape internet commerce

AI as a Role Model for Generation Alpha: Promise, Risks, and the Future of Childhood

AI as a Toy: Why Humanity Always Misuses New Technology First

OpenZeppelin Flags Data Flaws in OpenAI’s EVMbench Smart Contract AI Benchmark

Key shortcomings

About The Hosts

More From Alex Rowland

Industry

Tech Companies Use AI to Explain Layoffs as Job Cuts Rise Across the Sector

Models

Anthropic Launches Claude Fable 5 and Private Mythos 5 AI Models

Analysis

China’s AI Superapps Enter a New Era of Digital Competition

Robotics

Waymo Recalls 3,800 Robotaxis After Flooded Road Incident in Austin

Policy & Security

AI-Powered Identity Theft Is Becoming a Major Fraud Threat in the U.S.

Platforms

Anthropic Doubles Claude Code Limits After SpaceX Compute Deal

Platforms

OpenAI's First Hardware Device Will Be an AI Smartphone — Mass Production Could Start in 2027

Robotics

Japan Airlines Tests Humanoid Robots at Haneda Airport to Combat Labor Shortage

Policy & Security

Anthropic Launches Claude Security: AI-Powered Code Vulnerability Scanner Powered by Opus 4.7

Platforms

Alphabet Beats Estimates as Google Cloud and AI Drive Record Growth

Categories

AI News

Categories

AI & Society

Categories

AI Insights

OpenZeppelin Flags Data Flaws in OpenAI’s EVMbench Smart Contract AI Benchmark

Key shortcomings

About The Hosts

More From Alex Rowland