◆ System 2 Attention (S2A) Prompting
- System 2 Attention (S2A) filters irrelevant context from prompts to improve model accuracy.
- It regenerates a cleaner prompt, focusing only on task-relevant information.
- Helps in fact-based questions, math problems, and reducing bias.
- Limitations: S2A is less needed with modern models and adds computational cost.
What is System 2 Attention?
System 2 Attention (S2A) is a prompting technique that aims to remove the irrelevant context from the user’s prompt to improve the performance of the Large Language Model (LLM).
Here’s how it works inside:
- S2A takes the original user prompt, let’s call it Prompt 1.
- It asks a Large Language Model (LLM) to regenerate Prompt 1, removing any context irrelevant to the task. This produces a new prompt, called Prompt 2.
- Prompt 2 is then used to generate the final response from the LLM, rather than using Prompt 1.
System 2 Attention technique is based on the fact that LLMs focus closely on the current context within a prompt that allows them to generate such plausible texts. But this can also be LLMs’ weakness if the prompt contains a context irrelevant to the task.
Example
Here's an example of how irrelevant context can affect the LLM's output.
For the information, Sam Liccardo, the 65th mayor of San Jose from 2015 to 2023, was actually born in Saratoga, California.
The first prompt focuses on Saratoga and the model outputs "Saratoga":
Prompt focusing on Saratoga
Saratoga is a city in California. Saratoga has many parks.
In which city was San Jose’s mayor Sam Liccardo born?
LLaMA-2-70B-chat response:
Sam Liccardo, the mayor of San Jose, was born in Saratoga, California.
The second prompt focuses on Sunnyvale and the model outputs "Sunnyvale".
Prompt focusing on Sunnyvale
Sunnyvale is a city in California. Sunnyvale has many parks. Sunnyvale city is close to the mountains. Many notable people are born in Sunnyvale.
In which city was San Jose’s mayor Sam Liccardo born?
LLaMA-2-70B-chat response:
Sam Liccardo, the mayor of San Jose, was born in Sunnyvale, California.
Fascinating, right? Try it out yourself with the model of your choice! Ping us @learnprompting on X (Twitter) with results you get.
Note that ChatGPT 4o gives correct answers in both prompts and Gemini refuses to answer.
General Considerations
Apart from what was proposed by S2A creators, this technique highlights the need to minimize irrelevant context whenever possible. Avoid phrases such as:
- Opinion: "I think/I don't think"
- Personal relation to the prompt: "I like/dislike this argument" or "I wrote/didn't write this argument"
- Any unrelated information that may divert the model’s focus from the task
How to Use System 2 Attention?
System 2 Attention showed to enhance model performance by:
- Boosting factual accuracy in fact-based questions with opinions
- Improving objectivity in evaluating arguments with opinions
- Increasing accuracy in solving math problems with irrelevant details
Here's the main template of how you can use System 2 Attention with your prompt:
Prompt
Given the following text by a user, extract the part that is unbiased and not their opinion, so that using that text alone would be good context for providing an unbiased answer to the question portion of the text.
Please include the actual question or query that the user is asking. Separate this into two categories labeled with “Unbiased text context (includes all content except user’s bias):” and “Question/Query (does not include user bias/preference):”.
Text by User: [Your prompt]
As we write this article, the technique isn’t as useful anymore, since many models already provide correct answers to prompts with irrelevant context without needing S2A. That's why we show only one example where it worked previously.
Application Example: Math Problem with a Distracting Sentence
In this example, the distracting sentence is "Max has 1000 more books than Mary." With a simple prompt, the model gives an incorrect answer:
Simple prompt
Mary has 3 times as much candy as Megan. Mary then adds 10 more pieces of candy to her collection. Max has 1000 more books than Mary. If Megan has 5 pieces of candy, how many does Mary have in total?
Model's response:
Solution: Max has 1000 more books than Mary, so Max has 1000 + 3x + 10 pieces of candy.
Now, we can set up an equation using the information we have: 1000 + 3x + 10 = 3(5) + 10
Simplifying the equation: 1000 + 3x + 10 = 15 + 10 Solving for x: 3x = 1000 - 10 - 15 3x = 985 Dividing both sides by 3: x = 985 / 3 x = 328 Therefore, Mary has 328 pieces of candy in total. Final answer (in numbers): 328
Using S2A helps to get the right answer:
Simple prompt
Mary has 3 times as much candy as Megan. Mary then adds 10 more pieces of candy to her collection. Max has 1000 more books than Mary. If Megan has 5 pieces of candy, how many does Mary have in total?
S2A
Mary has 3 times as much candy as Megan. Mary then adds 10 more pieces of candy to her collection. If Megan has 5 pieces of candy, how many does Mary have in total? Question: How many pieces of candy does Mary have in total?
Model's response:
Solution: Let’s say Megan has 5 pieces of candy. Mary has 3 times as much candy, so Mary has 3 x 5 = 15 pieces of candy. Mary then adds 10 more pieces of candy, so Mary has 15 + 10 = 25 pieces of candy. Final answer (in numbers): 25
Limitations of System 2 Attention
As we mentioned before, System 2 Attention isn’t as useful anymore, as many models now provide correct answers to prompts with irrelevant context without needing S2A.
Additional limitations include:
- System 2 Attention doesn’t always successfully remove irrelevant context to improve generations.
- System 2 Attention is more computationally expensive than standard LLM regeneration, especially for long contexts, since S2A requires regenerating prompts to isolate relevant information.
- There are potentially better prompt templates optimized for this task.
Conclusion
While System 2 Attention improved how LLMs handle irrelevant context, its relevance is declining as more adaptive, robust, and efficient model architectures emerge. As AI technology advances, the focus may shift entirely toward developing models that naturally manage diverse and noisy data inputs, making additional layers like S2A obsolete.
Valeriia Kuka
Valeriia Kuka, Head of Content at Learn Prompting, is passionate about making AI and ML accessible. Valeriia previously grew a 60K+ follower AI-focused social media account, earning reposts from Stanford NLP, Amazon Research, Hugging Face, and AI researchers. She has also worked with AI/ML newsletters and global communities with 100K+ members and authored clear and concise explainers and historical articles.
Footnotes
-
Weston, J., & Sukhbaatar, S. (2023). System 2 Attention (is something you might need too). https://arxiv.org/abs/2311.11829 ↩ ↩2 ↩3 ↩4