Last updated on October 3, 2024
Code prompting is a novel technique that enhances reasoning abilities in text+code Large Language Models (LLMs) by transforming natural language (NL) tasks into code representations. Instead of executing the code, the model uses it as a structured input format to reason and generate answers. This approach aims to improve conditional reasoning, where conclusions depend on specific conditions or logical steps like determining eligibility for a visa or loan based on given rules.
A question like "Can a widow claim benefits?" is transformed into a code representation, with variables for key terms (e.g., "widow") and conditional logic for eligibility. The LLM uses this structured format to track variables and conditions effectively, improving its reasoning accuracy.
Code prompting can be applied to any reasoning task requiring logical steps or condition-based conclusions. Below is a template for how the code transformation might look:
You are a helpful assistant. Your task is to process a pseudo-code that describes a question and a document. You need to reason using that document and the comments to return the answers. Answers must be a short span of the document. You have to extract the span from the code comments. Do not write anything else. I will give you some examples first.
Q1: [Question 1] A1: [Solution using preudo-code]
Q2: [Question 2] A2: [Solution using preudo-code]
Q3: [Your question] A3:
Answers must be a short span of the document. You have to extract the span from the code comments. Do not write anything else.
Let’s think step by step:
You can find original prompts and their implementations on GitGub.
The performance of code prompting was evaluated on three datasets that require conditional reasoning:
Model | CondQA (F1) | ShARC (F1) | BGQA-3 (F1) | Avg. Gain (F1) |
---|---|---|---|---|
GPT-3.5 | +22.52% | +8.42% | +18.52% | +8.53% |
Mixtral | +7.75% | +4.22% | +14.57% | +4.23% |
Mistral | +16.78% | +2.74% | +5.93% | +2.74% |
Code prompts significantly outperformed text-based prompts across all models and datasets, showing the highest gains on reasoning-intensive tasks like BGQA.
Code prompting is an effective strategy for eliciting conditional reasoning abilities in text+code LLMs, enhancing their performance on logical tasks by providing structured inputs. This method not only improves accuracy but also reduces the need for extensive demonstrations, making it a valuable tool for improving reasoning in AI applications.