A new paper by researchers at Google Research presents two simple methods for improving the performance of language models—but only for models without an active reasoning mode.
Before explaining the methods, the researchers draw an important distinction between standard language models (non-reasoning) and reasoning models. Standard LLMs generate an answer immediately after receiving a prompt. Reasoning models, by contrast, first “think” internally: they create a chain of thought in which they analyze the task, add details, and often repeat parts of the prompt. This behavior is noticeable because responses arrive with a delay and, depending on the interface, the model’s reasoning process may be visible in a collapsible “thinking” section.
Examples of reasoning models include GPT-5.2-Thinking and Gemini 3 Pro. Claude is a special case: starting with version 3.7, Claude Sonnet operates as a hybrid model that can optionally use reasoning depending on the settings. In the study, Claude 3 Haiku and Claude 3.7 Sonnet were tested with reasoning disabled. Examples of non-reasoning models include GPT-4o and DeepSeek V3. The following prompting techniques apply to these and similar models.
Method 1: Put the question before the context
The question or task should come before any context or data, not after it. Language models process text from left to right, and earlier tokens cannot “see” what comes later. If the question appears only at the end, the model processes the preceding data without knowing what it should focus on.
For businesses, this means prompts should begin with the task, followed by documents, lists, or customer data.
Method 2: Repeat the prompt
Repeating the entire prompt can significantly improve results. The process is simple: write your prompt, copy it, and paste it directly below the original—then submit the message. The model now reads the same prompt twice.
Example without repetition:
“Explain the theory of relativity in three sentences for a general audience.”
Example with repetition:
“Explain the theory of relativity in three sentences for a general audience.
Explain the theory of relativity in three sentences for a general audience.”
In 47 out of 70 tests, the repetition method performed better—with no downsides. Responses were neither longer nor slower.
Why it works—and when it doesn’t
As with the first method, the explanation is straightforward: the model reads text only forward. When the prompt is repeated, the model reads it a second time, now informed by the first pass. This improves contextual understanding.
For reasoning models, however, the method offers little benefit (five improvements, one degradation, and 22 neutral results), because these models already repeat parts of the question internally as part of their reasoning process.
Triple repetition for specific tasks
For certain tasks, triple repetition can be especially effective—particularly when the model must locate positions in long lists or count elements. In one test, the accuracy of Gemini 2.0 Flash-Lite rose from 21.33% to 97.33% when this approach was used.
For tasks involving long documents, tables, or name lists, adding an extra repetition can substantially improve performance.
ES
EN