"This is a really big deal. I don't know how to make sense of it. I'm coming to this conclusion reluctantly, because the implications are so large that I feel overwhelmed by them — and I'm not sure society is ready for the changes that automated AI development implies," he said.

Clark described a scenario of full automation of AI research — a model that, on its own:

  • sets research objectives;
  • designs experiments;
  • writes and tests code;
  • optimizes training;
  • improves the architecture of the next AI version.

The expert called this "a Rubicon into a near-unpredictable future" and put the probability of such a scenario at 60% within the next two years.

What the Forecast Is Based On

Clark's conclusion rests on the trajectory of several benchmarks:

  • SWE-Bench — a test of solving real engineering tasks against GitHub repositories. At the end of 2023, the best models handled around 2% of cases; by spring 2026, the figure reached 94%.
  • CORE-Bench — reproducing the results of scientific AI papers, including setting up the environment, running the code, and analyzing the conclusions. According to Clark, the benchmark is effectively "closed": modern agents now score around 95.5%.
  • MLE-Bench — performing Kaggle-level ML tasks. The best agentic systems are already reaching 64–65%.

According to Anthropic's co-founder, all three metrics point in the same direction: AI is rapidly moving from writing isolated code snippets to fully executing engineering and research tasks.

Rising Autonomy

Another argument is the growing duration of tasks AI models can complete without human intervention. According to METR, in 2022 systems handled tasks that took a human tens of seconds. In 2024, the figure rose to roughly 40 minutes; in 2025, to six hours. Today, frontier models can sustain engineering work for around 12 hours at a stretch.

Clark linked this to the spread of agentic coding tools. The longer a model can hold a goal, verify intermediate results, and correct errors, the more stages of the research cycle can be delegated to it.

Why This Matters for AI Development

The modern AI development cycle follows a single pattern: study the literature, reproduce results, set up the experiment, train or fine-tune the model, evaluate metrics, identify bottlenecks, and repeat. Progress on SWE-Bench, CORE-Bench, and MLE-Bench shows that models are already handling entire fragments of this cycle.

Clark separately pointed to progress in more specialized tasks. For example, AI is starting to be applied to GPU kernel design — the code that determines how efficiently models train and run inference on specific hardware.

Another direction is fine-tuning models. On the PostTrainBench benchmark, AI systems improve small open-source LLMs.

As of spring 2026, the best neural networks reach 25–28% of the target gain (versus 51% for human teams). Clark considers the result significant: the benchmark is set against real instruction-tuned models built by experienced researchers.

Anthropic measured how its own models optimize LLM training on CPUs. Over the course of a year, the speedup grew from 2.9× (Claude Opus 4) to 52× (Claude Mythos Preview). A human typically needs four to eight hours to complete the same task.

AI Is Already Learning to Manage AI

Clark noted that modern systems are starting to coordinate the work of other agents. This approach is already used in products like Claude Code or OpenCode: one assistant distributes tasks across multiple sub-assistants, supervises them, and collects the results.

This matters for AI development, which rarely involves a single linear task — it usually means dozens of parallel processes, including writing code and configuring environments. If a model begins to manage these loops on its own, the level of human involvement will drop sharply.

Do Neural Networks Need Creativity?

According to Anthropic's co-founder, one of the key questions is what AI development resembles more: discovering general relativity, or assembling Lego.

Clark acknowledged that current LLMs are not yet capable of generating fundamentally new scientific ideas. But for automating a substantial part of AI R&D, that may not be required.

"AI mostly moves forward through humans methodically running a certain loop: take a system that works well, scale some aspect of it, look at the errors that emerge during scaling, and fix them. That requires very few unconventional ideas, and most of the process resembles unglamorous, grunt-level engineering work," the expert noted.

Early Signs of Scientific Contribution

Clark believes AI models are already showing early signs of scientific intuition. He cited several examples from mathematics and computer science:

  • A team of mathematicians used Gemini to work through about 700 Erdős problems and produced 13 solutions, one of which the researchers called a "slightly non-trivial" contribution to an open problem.
  • Researchers from the University of British Columbia, the University of New South Wales, Stanford, and Google DeepMind published a mathematical proof developed with substantial assistance from Gemini-based tools.

What If the Forecast Is Right

Clark pointed out that the largest AI labs are already moving toward research automation. OpenAI plans to build an AI "intern" capable of independent scientific work, while Anthropic is publishing papers on automatic alignment to human values.

If the current pace holds, the industry will enter a phase of full automation of AI development, the expert predicts — launching a cycle in which every new generation of AI accelerates the arrival of the next.

If the transition occurs by the end of 2028, the world will face more than just a technological leap. Fundamental questions of safety, capital allocation, the role of human labor, and control over systems that begin to evolve faster than their creators will move to the forefront.

"If you forced me to put a probability on 2027, I'd say 30%. If we don't see this by the end of 2028, then I think we'll discover some flaw in the current technological paradigm, and human invention will be required to move forward," Clark concluded.