Code as Reasoning
- PAL Overview: Program-Aided Language Models (PAL) enhance problem-solving by generating code to represent intermediate reasoning steps (contrast this with CoT prompting, which uses natural language to reason).
What are Program-aided Language Models (PAL)?
Program-aided Language Models (PAL) are another example of a Modular Reasoning, Knowledge, and Language (MRKL) system. PALs write code to solve a given question and send it to a programmatic runtime to retrieve the result. Unlike Chain-of-Thought (CoT) prompting, which uses natural language for intermediate reasoning, PALβs intermediate reasoning is done through code.
One important thing to note is that PAL actually interleaves natural language (NL) and code. In the above image, in blue are natural language reasoning that PAL generates. Although it is not shown in the image, PAL actually generates '#' before each line of NL reasoning, so that they are interpreted as comments by the programmatic runtime.
Example
Let's look at an example of PAL solving a math question. I use a 3-shot prompt, which is a simplified version of PAL prompt.
I will use langchain, a Python package for chaining LLM functionality. First, a few installations are needed:
!pip install langchain==0.0.26
!pip install openai
from langchain.llms import OpenAI
import os
os.environ["OPENAI_API_KEY"] = "sk-YOUR_KEY_HERE"
Then, we can create an instance of GPT-3 davinci-002 (an API call happens when we use this object):
llm = OpenAI(model_name='text-davinci-002', temperature=0)
Here is the Few-Shot prompt:
MATH_PROMPT = '''
Q: There were nine computers in the server room. Five more computers were installed each day, from Monday to Thursday. How many computers are now in the server room?
# solution in Python:
"""There were nine computers in the server room. Five more computers were installed each day, from Monday to Thursday. How many computers are now in the server room?"""
computers_initial = 9
computers_per_day = 5
num_days = 4 # 4 days between Monday and Thursday
computers_added = computers_per_day * num_days
computers_total = computers_initial + computers_added
result = computers_total
return result
Q: Shawn has five toys. For Christmas, he got two toys each from his mom and dad. How many toys does he have now?
# solution in Python:
"""Shawn has five toys. For Christmas, he got two toys each from his mom and dad. How many toys does he have now?"""
toys_initial = 5
mom_toys = 2
dad_toys = 2
total_received = mom_toys + dad_toys
total_toys = toys_initial + total_received
result = total_toys
Q: Jason had 20 lollipops. He gave Denny some lollipops. Now Jason has 12 lollipops. How many lollipops did Jason give to Denny?
# solution in Python:
"""Jason had 20 lollipops. He gave Denny some lollipops. Now Jason has 12 lollipops. How many lollipops did Jason give to Denny?"""
jason_lollipops_initial = 20
jason_lollipops_after = 12
denny_lollipops = jason_lollipops_initial - jason_lollipops_after
result = denny_lollipops
Q: {question}
# solution in Python:
'''
Now we can pass the combined prompt to GPT-3:
llm_out = llm(MATH_PROMPT.format(question=question))
print(llm_out)
The output is:
Emma took a 60-minute plane ride to Seattle. She then took a 2-hour train
ride to Portland, and then a 30-minute bus ride to Vancouver. How long did
it take her to get to Vancouver?
plane_ride = 60
train_ride = 2 * 60 # 2 hours in minutes
bus_ride = 30
total_time = plane_ride + train_ride + bus_ride
result = total_time
Finally, we can pass this code to a Python runtime to get the answer:
exec(llm_out)
print(result)
The output is 210, which is correct.
See the Jupyter notebook for this example of Program-aided Language Models.
More
Also see PAL's colab example.
Sander Schulhoff
Sander Schulhoff is the Founder of Learn Prompting and an ML Researcher at the University of Maryland. He created the first open-source Prompt Engineering guide, reaching 3M+ people and teaching them to use tools like ChatGPT. Sander also led a team behind Prompt Report, the most comprehensive study of prompting ever done, co-authored with researchers from the University of Maryland, OpenAI, Microsoft, Google, Princeton, Stanford, and other leading institutions. This 76-page survey analyzed 1,500+ academic papers and covered 200+ prompting techniques.
Footnotes
-
Gao, L., Madaan, A., Zhou, S., Alon, U., Liu, P., Yang, Y., Callan, J., & Neubig, G. (2022). β©