Least-to-Most Prompting
- Definition: Least-to-Most (LtM) prompting breaks down problems into simpler subproblems and solves them sequentially.
- Comparison with Chain-of-Thought: Unlike Chain-of-Thought prompting, where each step is independent, LtM uses the output of previous subproblems as input for the next.
- Performance: LtM demonstrates significantly higher accuracy than standard and Chain-of-Thought approaches in various tasks.
What is Least-to-Most Prompting?
Least-to-Most prompting (LtM) takes Chain-of-Thought Prompting (CoT) a step further by first breaking a problem into sub problems then solving each one. It is a technique inspired by real-world educational strategies for children.
As in Chain-of-Thought prompting, the problem to be solved is decomposed in a set of subproblems that build upon each other. In a second step, these subproblems are solved one by one. Contrary to Chain-of-Thought, the solution of previous subproblems is fed into the prompt trying to solve the next problem.
Least-to-Most Prompting Example: Customer Inquiry Response
Let's ask a slightly complicated customer service question:
That failed (we are within the return time), so let's try to break it down into subproblems:
Let's try to solve the first subproblem:
By just solving the first subproblem, we were able to solve the entire problem. If GPT-3 did not return an answer immediately, we could have solved the next subproblem and so on until it did return an answer. Note that we use Let's go step by step.
. The addition of this phrase is not always necessary, but it helps for this example.
Least-to-Most Prompting Example: letter concatenation
LtM was originally introduced using Few-Shot Prompting, rather than an explicit instruction to break down a problem into multiple steps (as seen above). Additionally, it can sometimes be implemented with a single prompt rather than chained prompts. Let's examine the problem of concatenating the last letter of individual words (for example, given cake, etymology
as input words, the output should be ey
).
First attempt: Standard
The standard prompt with Few-Shot examples performs very poorly, even with a more advanced model such as text-davinci-003.
Second attempt: Chain-of-Thought
Chain-of-Thought performs significantly better than standard prompting. This is because it now allows the model to consider extracting the last letter of each word on its own, reducing the complexity down to the operation of grouping letters it has previously collected. However, this starts to break down at larger sizes.
Third attempt: Least-to-Most (single prompt)
With Least-to-Most prompting, we augment the Chain-of-Thought concept by reformulating the individual steps to restate the previously concatenated result. This simplifies each step to concatenating only a single new letter. This leads to good performance all the way to 12 or more words.
This approach may look very similar to Chain-of-Thought, but it is conceptually very different. Here, at every step, we introduce the previous concatenation. In the case of "think, machine, learning", instead of concatenating the letters "k", "e", and "g" individually, it will concatenate "k" and "e", then "ke" and "g". As a result of this reintroduction of the previous work, the model can now generalize to much longer chains because it carries the result incrementally along and only needs to do a small amount of work at each step.
Results
On the last letter concatenation problem with 12 words, Chain-of-Thought is 34% accurate, while Least-to-Most is 74% accurate (the paper uses text-davinci-002 as a model).
Least-to-Most Prompting Example: compositional generalization (SCAN)
The SCAN benchmark requires the model to convert natural language to sequences of actions. For example, the sentence "run left and walk twice" would be translated to "TURN_LEFT + RUN + WALK * 2". Language models perform especially poorly when confronted with sequences that are longer than those in the training set.
First attempt: Standard prompting
Using simple standard prompts, text-davinci-003 gets impressively far but still fails.
Second attempt: Least-to-Most, first step - Reduction
Here, we work with 2 different prompts. The first prompt is used to reduce the input problem into a simpler sequence of steps. The second prompt is used to map this simplified sequence of steps into actual actions.
Both prompts are pretty long and use compressed Python notation for the action to save on tokens.
The first step breaks the natural language description down in a more explicit, yet still human-like language. This will help the mapping step figure things out in sequence.
For example, "jump around left twice" is reduced to "jump left" -> TURN_LEFT + JUMP
and "jump around left" -> (TURN_LEFT + JUMP) * 4
. Similarly, the reduction step is what is used to explain the concept of repetition (twice, thrice, etc...).
Second attempt: Least-to-Most, second step - Mapping
In the second step, we use the output of the reduction, and again use a fairly long prompt (14 cases) to translate the reduced natural language description into a sequence of actions.
Here, we inject the output of the first step into the LLM:
Least-to-Most Prompting Results
LtM leads to multiple improvements:
- improved accuracy over Chain-of-Thought
- increased generalization on problems harder than those in the prompt
- dramatically improved performance in compositional generalization, in particular, the SCAN benchmark
Standard prompting with text-davinci-002 (the model used in the paper) results in 6% of successful SCAN problems solved, while Least-to-Most prompting results in an impressive 76% success rate. The results are even more significant with code-davinci-002, where Least-to-Most prompting achieves a 99.7% success rate.
Conclusion
Least-to-Most prompting helps generalize the Chain-of-Thought technique to more difficult inputs by breaking down complex problems into simpler subsections. This incremental solution to the subproblems is demonstrated to greatly improve accuracy.
FAQ
How can Least-to-Most prompting help a model solve more complex problems?
Least-to-Most prompting improves upon Chain-of-Thought prompting because it breaks down a complex problem into simpler and more manageable subproblems. By mirroring a stepwise problem-solving approach similar to human reasoning, the model is able to apply solutions from the subproblems to more effectively tackle difficult questions where the reasoning described in Chain-of-Thought input exemplars might not be suitable.
Sander Schulhoff
Sander Schulhoff is the Founder of Learn Prompting and an ML Researcher at the University of Maryland. He created the first open-source Prompt Engineering guide, reaching 3M+ people and teaching them to use tools like ChatGPT. Sander also led a team behind Prompt Report, the most comprehensive study of prompting ever done, co-authored with researchers from the University of Maryland, OpenAI, Microsoft, Google, Princeton, Stanford, and other leading institutions. This 76-page survey analyzed 1,500+ academic papers and covered 200+ prompting techniques.
Footnotes
-
Zhou, D., SchΓ€rli, N., Hou, L., Wei, J., Scales, N., Wang, X., Schuurmans, D., Cui, C., Bousquet, O., Le, Q., & Chi, E. (2022). Least-to-Most Prompting Enables Complex Reasoning in Large Language Models. β©
-
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., & Zhou, D. (2022). Chain of Thought Prompting Elicits Reasoning in Large Language Models. β©
-
Lake, B. M., & Baroni, M. (2018). Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks. https://doi.org/10.48550/arXiv.1711.00350 β© β©2