🟦 Max Mutual Information (MMI) Method
What is MMI?
Max Mutual Information Method (MMI) is a way to choose the optimal prompt template for your task by using the mutual information score between the template and the output of the model as a metric, and finding whichever template from your list of templates maximizes that metric.
Mutual information (MI) is a concept from information theory that quantifies how much information two variables share. In this case, it measures how much a given prompt reveals about the model's output. The intuition is that a prompt with high MI is more likely to produce accurate responses, even if we don’t know the "right" answer ahead of time.
Benefits and Applications
- No Need for Labeled Data or Model Access: The MMI method selects optimal prompt templates without requiring labeled examples or direct access to model parameters.
- You Can Make Your Own Templates: Instead of needing a dataset of templates, you can use MMI with templates that you've made to see which is the best out of those.
- Efficient and Scalable: MMI can select effective prompts without manual tuning or operations on datasets, making it efficient and easy to scale computationally
How MMI Works
- Step 1: Generate Templates: Generate a set of prompt templates for your task. This can be done manually or by generating them with an LLM.
- Step 2: Run Sample Inputs: The model runs a few sample inputs to verify that the prompts generate reasonable outputs.
- Step 3: Calculate Mutual Information Scores: Plug in each template into the mutual information algorithm to get a score for each template.
- Step 4: Choose the Template With Highest Score: Choose whichever template got the highest mutual information score for your prompt.
Example Use: Country Capitals
First, you get a list of templates for the task:
- “What is the capital of [country]?”
- “The capital of [country] is:”
- “I need to know the capital of [country].”
- “Tell me the capital of [country].”
- “[country] has a capital city called?”
Second, you input them into the model, generate outputs, and calculate the mutual information scores for each. For example, the following is the calculation of the mutual information scores for the first two templates:
Prompt
What is the capital of France?
AI Output
The capital of France is Paris.
Mutual information score: 0.85
Prompt
The capital of France is:
AI Output
Paris.
Mutual information score: 0.92
The third and last step is to choose whichever prompt gets the highest mutual information score. Let's say the second template had the highest mutual information score (0.92), since the output of the model was very concise and answered the prompt as efficiently as possible.
Now you input the chosen prompt template into the model with your chosen country:
Prompt
The capital of Chad is:
AI Output
N'Djamena.
Conclusion
MMI is a simple and efficient approach to selecting the most effective prompt template for a given task. By using a list of templates and the calculated mutual information score for each, MMI lets you find the template that best aligns the model's responses with your task. This method is also flexible and can be used with very few resources.
Andres Caceres
Andres Caceres, a documentation writer at Learn Prompting, has a passion for AI, math, and education. Outside of work, he enjoys playing soccer and tennis, spending time with his three huskies, and tutoring. His enthusiasm for learning and sharing knowledge drives his dedication to making complex concepts more accessible through clear and concise documentation.
Footnotes
-
Sorensen, T., Robinson, J., Rytting, C., Shaw, A., Rogers, K., Delorey, A., Khalil, M., Fulda, N., & Wingate, D. (2022). An Information-theoretic Approach to Prompt Engineering Without Ground Truth Labels. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). https://doi.org/10.18653/v1/2022.acl-long.60 ↩