Self-Consistency

🟦 This article is rated medium

Reading Time: 3 minutes

Last updated on August 7, 2024

Takeaways

Enhances AI accuracy: Self-consistency prompting boosts reliability by averaging multiple responses to the same prompt.
Effective for various tasks: This technique improves outcomes in arithmetic and commonsense reasoning through consensus-based results.

What is Self-Consistency?

Self-consistency is an approach that simply asks a model the same prompt multiple times and takes the majority result as the final answer. It is a follow up to Chain-of-Thought Prompting, and is more powerful when used in conjunction with it.

Self-Consistency Example

Let's consider a simple example of analyzing emails. Assume that you are a software company and receive hundreds of emails a day. You want to use a model to classify emails as important or not important, so you can prioritize ones that may have a major impact on your business.

Here is an example of an email that you might receive. Let's put that into a prompt:

Prompt

Hello,

I have discovered a major security vulnerability in your system. Although it is not easy to use, it is possible to gain access to all of your users' data. I have attached a proof of concept. Please fix this issue as soon as possible.

Cheers,

Donny

Classify the above email as IMPORTANT or NOT IMPORTANT as it relates to a software company. Let's think step by step.

Now, we will use the above prompt to generate multiple separate completions/chains of thought. Here are a few examples:

AI Output

The sender has discovered a major security vulnerability in the system. This is important because it could lead to a data breach. I would classify this email as IMPORTANT.

AI Output

The sender has discovered a major security vulnerability in the system, but they have not provided any proof of concept and it is not easy to use. I would classify this email as NOT IMPORTANT.

AI Output

The sender is likely a hacker and may have access to the system. This is important because it could lead to a data breach. I would classify this email as IMPORTANT.

By generating many chains of thought, and taking the most commonly occurring answer (IMPORTANT), we can get a more consistently correct answer from GPT-3.

Note

The creators of Self-Consistency discuss a more complex method for selecting the final answer in the original paper, which deals with the LLM generated probabilities for each Chain-of-Thought. However, they do not use this method in their experiments, and majority voting seems to usually have the same or better performance.

Self-Consistency Results

Self-consistency has been shown to improve results on arithmetic, commonsense and symbolic reasoning tasks. Even when regular Chain-of-Thought was found to be ineffective, self-consistency was still able to improve results.

Notes

Wang et al. discuss a more complex method for selecting the final answer, which deals with the Large Language Model's (LLM) generated probabilities for each Chain-of-Thought. However, they do not use this method in their experiments, and majority voting seems to usually have the same or better performance.

Conclusion

Self-consistency proves to be a significant improvement of Chain-of-Thought prompting alone. By combining the techniques and taking a majority vote of the Chain-of-Thought responses, we are able to refine our model prompts to get more reliable outputs.

FAQ

What is self-consistency?

Self-consistency is a follow up to Chain-of-Thought prompting that takes the majority result of multiple model responses to the same prompt.

How does self-consistency improve AI model results?

By aggregating multiple responses to the same prompt, self-consistency ensures that the final answer to an input represents a consensus vote, which tends to be more reliable and accurate than individual Chain-of-Thought completions on their own.

Sander Schulhoff

Sander Schulhoff is the Founder of Learn Prompting and an ML Researcher at the University of Maryland. He created the first open-source Prompt Engineering guide, reaching 3M+ people and teaching them to use tools like ChatGPT. Sander also led a team behind Prompt Report, the most comprehensive study of prompting ever done, co-authored with researchers from the University of Maryland, OpenAI, Microsoft, Google, Princeton, Stanford, and other leading institutions. This 76-page survey analyzed 1,500+ academic papers and covered 200+ prompting techniques.

Footnotes

Wang, X., Wei, J., Schuurmans, D., Le, Q., Chi, E., Narang, S., Chowdhery, A., & Zhou, D. (2022). Self-Consistency Improves Chain of Thought Reasoning in Language Models. ↩ ↩²
Ye, X., & Durrett, G. (2022). The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning. ↩

DIFFICULTY LEVEL

RECOMMENDED COURSES

ChatGPT for Everyone

Introduction to Prompt Engineering

Live Courses

Self-Consistency

What is Self-Consistency?

Self-Consistency Example

Prompt

AI Output

AI Output

AI Output

Self-Consistency Results

Notes

Conclusion

FAQ

What is self-consistency?

How does self-consistency improve AI model results?

Sander Schulhoff

Footnotes