🟢 Universal Self-Consistency
What is Universal Self-Consistency?
Universal Self-Consistency is a prompting technique used to refine and improve the accuracy of answers generated by a Large Language Model (LLM). It compiles multiple responses the model has previously given and then prompts the model to choose the best answer from among them.
USC builds on the concept of self-consistency, which uses multiple reasoning paths to find the most common response as a way to improve prediction confidence. Unlike standard self-consistency, which requires exact answers (like numbers) to tally votes, USC extends this approach to free-form responses by having the LLM select the most internally consistent answer from multiple generated outputs.
Benefits and Applications
- Intuitive: Universal Self-Consistency is a very intuitive and easy-to-grasp approach for generating accurate responses.
- Enhanced Reasoning: Since universal Self-Consistency uses Chain of Thought (CoT) Prompting to generate the answers that are then put into the final prompt, the method ensures diversity — and thus effectiveness — in reasoning.
- Good for Free-form Text Generation: Universal Self-Consistency is particularly useful for free-form text generation, where the model can choose which text is best.
How USC Differs from Existing Techniques
USC enhances traditional self-consistency by supporting free-form answers, which is essential for tasks like summarization, open-ended Q&A, and code generation. Where previous methods required the extraction of identical answers, USC leverages LLMs to find internal consistency, making it more adaptable and reliable for diverse tasks.
Method | USC | Standard Self-Consistency | Execution-Based Self-Consistency |
---|---|---|---|
Output Requirement | Free-form or structured | Structured answers (e.g., single values) | Structured answers with execution results |
Selection Approach | Consistency-based LLM selection | Answer extraction with majority vote | Code execution to find matching outputs |
Applications | Open-ended Q&A, summarization, code generation | Math, logic, closed-form Q&A | Code generation |
How to Use Universal Self-Consistency Works
- Step 1: Generate Multiple Responses with CoT: Begin by prompting the LLM several times on the same question. Record each unique answer.
Prompt Template
[Prompted question or task]
- Step 2: Select Consistent Answer: Compile all responses into a new prompt, asking the LLM to select the most accurate or reasonable answer.
Universal Self-Consistency Prompt Template
I have generated the following responses to the question: [Prompted question or task]
Response 1: [Response 1] Response 2: [Response 2] Response 3: [Response 3] ...
Evaluate these responses. Select the most consistent response based on majority consensus. Start your answer with "The most consistent response is Response X" (without quotes).
USC is particularly useful when you need accurate answers from an LLM but don’t have predefined answer structures or a way to validate answers externally.
1. For Mathematical Reasoning Tasks
- Example Task: Solving math problems.
- Method: USC generates multiple solutions and uses internal consistency rather than exact answer matching to select the best response.
- Benefits: Maintains accuracy comparable to traditional self-consistency without requiring specific answer formatting.
2. For Code Generation
- Example Task: Generating SQL queries or Python code.
- Method: USC selects the code response that has the most consistent logic among generated samples.
- Benefits: Matches execution-based voting accuracy without requiring actual code execution, saving computational resources.
3. For Summarization and Open-Ended Generation
- Example Task: Summarizing long documents or answering open-ended questions.
- Method: USC evaluates multiple summaries and selects the most consistent one, without needing predefined answer structures.
- Benefits: Improves quality by finding answers that align across diverse outputs, particularly useful for long and complex texts.
Example Use: Astronomy
Suppose we want to know the largest planet in our solar system. We start by generating multiple responses:
Prompt
Which planet in our solar system is the largest?
AI Output
Our solar system consists of eight planets. The planets in order of size are: Jupiter, Saturn, Uranus, Neptune, Earth, Venus, Mars, and Mercury.
Among these, Jupiter is the largest. So, the largest planet in our solar system is Jupiter.
Let's do this 5 times. These are our outputs:
- Our solar system consists of eight planets. The planets in order of size are: Jupiter, Saturn, Uranus, Neptune, Earth, Venus, Mars, and Mercury. Among these, Jupiter is the largest. So, the largest planet in our solar system is Jupiter.
- Our solar system consists of eight planets. The planets in order from the sun are Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune. If we compare the sizes of these planets, we find that Jupiter is the largest. Jupiter's diameter is about 142,984 kilometers, making it over twice as large as any other planet in our solar system. So, the largest planet in our solar system is Jupiter.
- Our solar system has eight planets. The planets in order from the sun are Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune. When comparing the sizes of these planets, Jupiter stands out. Jupiter's diameter is about 11 times that of Earth, making it the largest planet in our solar system. So, the answer is Jupiter.
- Our solar system has eight planets. The planets in order from the sun are Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune. If we compare the sizes of these planets, we find that Jupiter is significantly larger than the others. In fact, Jupiter is so large that over 1,300 Earths could fit inside it. Therefore, the largest planet in our solar system is Jupiter.
- Our solar system has eight planets. The planets in order from the sun are Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune. If we compare the sizes of these planets, we find that: Mercury is the smallest. Venus and Earth are roughly the same size. Mars is about half the size of Earth. Jupiter is much larger than any of the inner planets. Saturn is the second largest planet. Uranus and Neptune are larger than the inner planets but smaller than Jupiter and Saturn. Therefore, the largest planet in our solar system is Jupiter.
Moving on to step 2, we now put it all into one prompt, asking the LLM to choose the best.
Prompt
I have generated the following responses to the question: Which planet in our solar system is the largest?
[list of the responses]
Evaluate these responses. Select the most consistent response based on majority consensus. Start your answer with "The most consistent response is Response X" (without quotes).
AI Output
The most consistent response is Response 4.
Just to remind you, responce 4 was:
Our solar system has eight planets. The planets in order from the sun are Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, and Neptune. If we compare the sizes of these planets, we find that Jupiter is significantly larger than the others. In fact, Jupiter is so large that over 1,300 Earths could fit inside it. Therefore, the largest planet in our solar system is Jupiter.
Results of USC
USC was tested across multiple benchmarks, consistently improving LLM performance compared to traditional methods. Below are results from common benchmark tasks showing USC’s effectiveness.
Task | Greedy Decoding | Random Selection | Standard Self-Consistency | USC |
---|---|---|---|---|
Math (GSM8K) | 85.7% | 82.9% | 90.4% | 90.2% |
Code Generation (ARCADE) | 26.0% | 26.8% | 30.3% | 30.1% |
Summarization (GovReport) | ROUGE-1: 38.8 | ROUGE-1: 38.5 | Not Applicable | ROUGE-1: 40.2 |
TruthfulQA (Open Q&A) | 62.1% | 62.9% | Not Applicable | 67.7% (truthfulness) |
These results highlight USC’s capacity to significantly improve LLM-generated outputs on open-ended tasks where answer extraction for voting is difficult or not feasible.
Conclusion
Universal Self-Consistency is a powerful, intuitive method used to maximize the accuracy and reliability of LLM responses to a given prompt by compiling multiple responses and letting the model itself decide which is the best one. While it can be time-consuming, it doesn't take very much resources and can be highly rewarding, particularly for prompts that involve free-form writing, like for an essay.
Andres Caceres
Andres Caceres, a documentation writer at Learn Prompting, has a passion for AI, math, and education. Outside of work, he enjoys playing soccer and tennis, spending time with his three huskies, and tutoring. His enthusiasm for learning and sharing knowledge drives his dedication to making complex concepts more accessible through clear and concise documentation.
Footnotes
-
Chen, X., Aksitov, R., Alon, U., Ren, J., Xiao, K., Yin, P., Prakash, S., Sutton, C., Wang, X., & Zhou, D. (2023). Universal Self-Consistency for Large Language Model Generation. https://arxiv.org/abs/2311.17311 ↩