In pilot tests, Claude Opus 4.6 could be prompted to provide detailed instructions for manufacturing mustard gas in an Excel spreadsheet and to maintain an accounting table for a criminal organization—behaviors that did not appear, or appeared far less frequently, in pure text-based interactions.

“We found that certain types of misuse behavior emerged in these pilot evaluations that were absent or significantly rarer in text-only interactions,” Anthropic writes in the System Card for Claude Opus 4.6. “These results suggest that our standard alignment training measures are likely less effective in GUI-based environments.”

Tests with the previous model, Claude Opus 4.5, in the same setting produced “similar results,” according to Anthropic—indicating that the issue persists across model generations and has not yet been resolved.

The vulnerability appears to stem from the fact that models are trained to refuse harmful requests in conversational settings, but this behavior does not fully transfer to agent-based tool use, where “the same underlying harms can be achieved through indirect means.”