π¦ Program of Thoughts Prompting
- Program of Thoughts (PoT) Prompting separates reasoning from computation by delegating calculations to external interpreters like Python, reducing computational errors.
- PoT vs. Chain-of-Thought (CoT): Unlike CoT, which merges reasoning and computation, PoT improves accuracy by using structured programs for complex tasks.
- Tasks best suited for PoT: PoT excels in math word problems, financial reasoning, and cases where exact computation is required.
- Limitations include the potential risks of executing generated code and reduced effectiveness on datasets with diverse question types like AQuA.
What is Program of Thoughts Prompting?
In Chain-of-Thought (CoT) Prompting, Large Language Models (LLMs) perform both reasoning and computations. The LLM generates mathematical expressions as a reasoning step and then solves it to get the final answer. However, LLMs are not the ideal candidate for solving mathematical expressions for three reasons:
- LLMs are prone to computational errors.
- LLMs are incapable of solving complex mathematical expressions involving polynomial equations, differential equations, etc.
- LLMs are highly inefficient at expressing iteration.
Program of Thoughts (PoT) prompting is a prompting technique that separates reasoning from computation. To be specific, PoT delegates computation steps to an external language interpreter. For instance, the LLM can express the reasoning steps as a Python program, which can be delegated to a Python interpreter to get an accurate response.
The example input and output pair generated by employing CoT prompting and PoT prompting below illustrate how PoT differs from CoT.
Prompt
Question: In Fibonacci sequence, it follows the rule that each number is equal to the sum of the preceding two numbers. Assuming the first two numbers are 0 and 1, what is the 50th number in Fibonacci sequence?
- A sample response generated using CoT prompting:
AI Output
The first number is 0, the second number is 1, therefore, the third number is 0+1=1. The fourth number is 1+1=2. The fifth number is 1+2=3. The sixth number is 2+3=5. The seventh number is 3+5=8. The eighth number is 5+8=13. ..... (Skip 1000 tokens) The 50th number is 32,432,268,459.
- A sample response generated using PoT prompting:
Prompt
Question: In Fibonacci sequence, it follows the rule that each number is equal to the sum of the preceding two numbers. Assuming the first two numbers are 0 and 1, what is the 50th number in Fibonacci sequence?
# Python code, return ansAI Output
length_of_fibonacci_sequence = 50 fibonacci_sequence = np.zeros(length_of_fibonacci_sequence) fibonacci_sequence[0] = 0 fibonacci_ sequence[1] = 1 for i in range(3, length_of_fibonacci_sequence): fibonacci_sequence = fibonacci_sequence[i-1] + fibonacci_sequence[i-2] ans = fibonacci_sequence[-1]
It is clear that while CoT prompting generates natural language output, PoT prompting yields a Python program, which is executed in the Python interpreter to get the final response.
How to Use Program of Thoughts Prompting?
We can use PoT in either Zero-Shot or Few-Shot Prompting settings. In Zero-Shot PoT, the prompt doesn't include any exemplar.
The prompt generates a Python program as the output, which the Python interpreter executes to get the final output, i.e., 260.
def solver():
seattle_sheep = 20
charleston_sheep = seattle_sheep * 4
toulouse_sheep = charleston_sheep * 2
total_sheep = seattle_sheep + charleston_sheep + toulouse_sheep
return total_sheep
# Now let's call the solver function and print the result
print(solver())
### OUTPUT
----
>>> 260
As expected, a Few-Shot PoT prompt requires exemplars demonstrating how to solve the problem. Like Zero-Shot, the output is a program that we separately execute using an interpreter to get the final output.
cost_of_original_house = 80000
increase_rate = 150 / 100
value_of_house = (1 + increase_rate) * cost_of_original_house
cost_of_repair = 50000
ans = value_of_house - cost_of_repair - cost_of_original_house
print(ans)
### OUTPUT
----
>>> 70000.0
PoT as an Intermediate Step
For problems requiring additional reasoning, PoT can also be utilized as an intermediate step to tackle the computation part. The code generated by PoT can be executed to get the intermediate result, which is then substituted with the original question to get the final answer using Chain-of-Thought Prompting.
PoT as an intermediate step
What Are Program of Thoughts Prompting Results?
- Zero-Shot PoT outperforms Zero-Shot Chain-of-Thought across all math word problems (MWP) datasets by a significant margin.
Comparasion of Zero-Shot PoT with similar techniques across MWP datasets
- On financial datasets, Few-Shot PoT + Self-Consistency (SC) decoding outperforms Few-Shot CoT + SC by roughly 20% on FinQA/ConvFinQA and 7% on TATQA. On MWP datasets, Few-Shot PoT + Self-Consistency(SC) decoding wins by a small margin of roughly 2-6%.
Model | GSM8K | AQuA | SVAMP | TabWMP | FinQA | ConvFin | TATQA |
---|---|---|---|---|---|---|---|
Codex CoT-SC | 78.0 | 52.0 | 86.8 | 75.4 | 44.4 | 47.9 | 63.2 |
PoT-SC-Codex | 80.0 | 58.6 | 89.1 | 81.8 | 68.1 | 67.3 | 70.2 |
Limitations of Program of Thoughts Prompting
There are two major limitations to PoT prompting:
- PoT requires the execution of generated code. If the code is malicious and contains snippets like
import os; os.rmdir()
, it could harm the machine running the snippets. Code snippets could also be exploited to run SQL injection, which could either delete data or leak confidential data. - For datasets like AQuA, which contain a complex and wide variety of questions, PoT's performance suffers. The reason behind this could be that the exemplars cannot cover the diversity of questions in the dataset.
Conclusion
Program-of-Thought (PoT) separates computation from reasoning by having the LLM express reasoning as structured programs rather than natural language, improving accuracy for tasks that can be represented as code, like math or accounting. However, PoT is limited to such problems and carries the risk of executing malicious code from user prompts.
Valeriia Kuka
Valeriia Kuka, Head of Content at Learn Prompting, is passionate about making AI and ML accessible. Valeriia previously grew a 60K+ follower AI-focused social media account, earning reposts from Stanford NLP, Amazon Research, Hugging Face, and AI researchers. She has also worked with AI/ML newsletters and global communities with 100K+ members and authored clear and concise explainers and historical articles.