In the paper, researchers from several U.S. and British universities present evidence that AI assistance improves immediate performance but also creates unwanted side effects. Once the AI is removed, users perform worse than people who had to solve the tasks on their own from the start. They are also more likely to give up on the tasks altogether.

According to the authors, earlier signs of this effect came mostly from surveys or small samples. They argue that this is the first large-scale causal evidence based on controlled experiments.

Fractions as a test case: performance drops when the AI disappears

In the first experiment, participants solved 15 fraction problems of varying difficulty, ranging from simple one-step calculations to more complex three-step tasks. One group had access to GPT-5 through a sidebar that had been preloaded with each problem and its solution. That meant participants could get the correct answer with almost no effort, for example by simply typing “Answer?” The control group worked without assistance.

After 12 tasks, the AI was removed without warning, and all participants had to complete three identical test problems on their own.

As long as the AI was available, the AI-assisted group solved almost all fraction tasks correctly. But once the tool disappeared for the final three test questions, their success rate fell below that of the control group, and their task-skipping rate increased sharply.

On those final test tasks, former AI users solved significantly fewer problems correctly than the control group. At the same time, they skipped tasks nearly twice as often. Because there was no penalty for wrong answers and pay was not tied to performance, the researchers interpret skipping as a direct measure of persistence and motivation.

Replication confirms the effect

A second experiment addressed a methodological issue in the first one. In the original setup, weaker participants in the AI group could still submit correct answers thanks to the assistant, meaning they were not excluded under the same conditions as those in the control group.

This time, the researchers added a pretest with simple fraction problems and gave the control group a sidebar containing the pretest solutions, so that both groups had a similar interface.

The results confirmed the main finding: the AI group again performed worse than the control group once assistance was removed. The higher skipping rate pointed in the same direction, although it did not reach statistical significance in the overall analysis.

The researchers suggest that differences in usage patterns may explain that result.

Users who asked for direct answers suffered the most

Around 61 percent of AI users said they mainly used the assistant for direct answers. About a quarter used it for hints or explanations, while the rest did not use it at all.

In the pretest, these groups did not differ in either accuracy or skipping behavior. In other words, their starting ability and motivation were comparable.

That changed after AI access was taken away. Participants who relied on direct answers performed the worst, while those who ignored the AI achieved the highest success rates, even surpassing the control group. The direct-answer users also declined relative to their own pretest performance, whereas the other groups stayed stable or improved.

The data suggest that the negative effects are driven primarily by users who delegate the work and ask the AI for solutions outright.

The same pattern appears in reading comprehension

To test whether the effect was limited to math, the researchers repeated the design using SAT-style reading comprehension tasks. The control group again received a sidebar, this time with general test-taking tips, to mirror the context shift between learning and testing.

The team also counted answers submitted in under five seconds as skipped, since that was too little time to read the passage.

The result matched the math experiments: the AI group solved fewer questions correctly in the unassisted test and skipped significantly more of them. According to the researchers, reduced persistence therefore appears to be a general consequence of AI-supported problem solving, including in tasks closely tied to critical thinking.

Two mechanisms, one structural problem

The study proposes two explanations for the drop in persistence.

First, AI may shift the user’s internal reference point for how long a task should take. Once fast AI help becomes normal, working independently feels more effortful by comparison, similar to an adaptation effect after positive experiences. The researchers argue that this mechanism may reinforce itself: every time effort is outsourced, future independent effort feels more costly.

Second, users miss out on the productive struggle through which they normally build both knowledge and an accurate sense of their own abilities.

The authors place their findings in the broader debate over gradual skill loss. AI systems optimized for immediate helpfulness, they argue, may undermine users’ long-term capabilities. Fraction arithmetic and reading comprehension may look like simple tasks that can be delegated, but they are also foundational for higher-level skills such as algebra and critical thinking.

Students with fewer academic resources may be especially vulnerable. If as little as ten minutes of AI use can produce measurable effects, the researchers warn that the consequences could accumulate over months and years and become difficult to reverse.

They describe user-side fixes such as Socratic AI or usage limits as little more than band-aids. What is really needed, they argue, is a redesign of AI systems away from short-term user satisfaction and toward tools that support autonomy, including by sometimes refusing to help directly.

Earlier studies pointed in the same direction

Several earlier studies had already suggested similar concerns, though with weaker methodology.

A study from the Swiss Business School found a strong negative correlation between AI use and critical thinking, especially among younger participants aged 17 to 25. Higher levels of education appeared to offer some protection: people with more formal education were more likely to question AI-generated information and engage in deeper thinking.

A joint study by Microsoft Research and Carnegie Mellon described an “irony of automation”: by taking over routine tasks, AI tools reduce opportunities for users to train their cognitive muscles. In routine or lower-stakes tasks especially, people tend to simply defer to the AI.

An Anthropic study involving 52 mostly junior software developers found that AI assistance can also impair the learning of new programming skills. Participants were asked to solve two programming tasks using the unfamiliar Trio library. One group had access to a GPT-4o-based AI assistant, while the control group relied only on documentation and web search. In the follow-up knowledge test, participants with AI access scored 17 percent lower.

Here too, the way people used the AI mattered. Those who used it for explanations learned substantially better than those who heavily delegated the work.

Experience also appears to matter. In another Anthropic study, experienced Claude users achieved success rates around four percentage points higher than newcomers on the same tasks. Rather than just issuing instructions, they worked with the model iteratively.

At the same time, multiple studies also show that AI can improve the performance of individuals and teams. For many companies, however, turning these isolated productivity gains into higher efficiency and stronger revenue remains difficult. There is no shortage of reasons why.