Math
- Definition of MathPrompter: MathPrompter combines techniques like CoT and PAL to improve LLM accuracy in solving math problems by using algebraic templates and Python code.
- Four Steps of MathPrompter: The process includes generating an algebraic template, creating math prompts, generating answers using Python, and applying self-consistency to refine results.
- Performance: MathPrompter achieves a reported accuracy of 92.5% on the MultiArith dataset, showcasing its effectiveness in handling complex mathematical tasks.
What is MathPrompter?
Throughout this course, we have seen many different prompting methods that can be used to improve LLM math ability. One recent approach, MathPrompter, unifies some of these methods (COT, PAL, etc.) into a single technique. The overarching idea is to break down a math question into algebraic terms and then use Python code to solve it in different ways.
MathPrompter has four steps. We will explain them using the following example problem. The example is taken directly from the paper.
Q: At a restaurant, each adult meal costs $5 and kids eat free. If a group of 15 people came in and 8 were kids, how much would it cost for the group to eat?
Step 1: Generate Algebraic Template
The first step is to assign a variable to each number in the question. This helps because it allows easier translation of the question into an abstract math question, as well as into programming code.
This can be done via Few-Shot prompting:
Step 2: Math Prompts
The point of this step is to formulate the problem as both an algebraic statement and as Python code. This step has two simultaneous prompts, which help to give diverse representations of the problem.
2a: Algebraic Statement
We can Few-Shot prompt the LLM to represent the math problem as an algebraic statement. This is done by asking the LLM to generate the answer format, starting with "Answer =".
2b: Python Code
We can also ask the LLM to generate Python code that solves the problem. This is done by asking the LLM to generate a Python function.
Step 3: Answer Generation
Now, we can use the Mapping that we generated previously to automatically fill in the variables.
Mapping: {A: 5, B: 15, C: 8}
Algebraic:
Answer = 5 * 15 - 5 * 8
Python function:
def restaurant_cost(A=5, B=15, C=8):
return A * (B - C)
We can evaluate both using Python.
Algebraic:
>
> eval("5 * 15 - 5 * 8")
35
Python function:
>
> restaurant_cost()
35
Step 4: Self-Consistency
Finally, we will leverage Self-Consistency to rerun the above process multiple times (~5), then take the majority answer.
Conclusion
MathPrompter reports 92.5% accuracy on the MultiArith dataset. The success of this technique is a great example of how you as a prompt engineer can take methods that you have learned throughout this course and combine them to deal with larger problems.
FAQ
How does MathPrompter work?
MathPrompter operates in four main steps to help improve LLM's capacity for solving a math problem: (1) generate an algebraic template, (2) math prompts, (3) answer generation, and (4) self-consistency.
What is self-consistency?
Self-consistency involves generating multiple chains of thought and taking the majority answer.
How accurate is MathPrompter?
MathPrompter reports 92.5% accuracy on the MultiArith dataset.
Sander Schulhoff
Sander Schulhoff is the Founder of Learn Prompting and an ML Researcher at the University of Maryland. He created the first open-source Prompt Engineering guide, reaching 3M+ people and teaching them to use tools like ChatGPT. Sander also led a team behind Prompt Report, the most comprehensive study of prompting ever done, co-authored with researchers from the University of Maryland, OpenAI, Microsoft, Google, Princeton, Stanford, and other leading institutions. This 76-page survey analyzed 1,500+ academic papers and covered 200+ prompting techniques.
Footnotes
-
Imani, S., Du, L., & Shrivastava, H. (2023). MathPrompter: Mathematical Reasoning using Large Language Models. β©
-
Roy, S., & Roth, D. (2015). Solving General Arithmetic Word Problems. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 1743β1752. https://doi.org/10.18653/v1/D15-1202 β© β©2