Prompt Engineering Guide
😃 Basics
💼 Applications
🧙‍♂️ Intermediate
🧠 Advanced
Special Topics
🌱 New Techniques
🤖 Agents
⚖️ Reliability
🖼️ Image Prompting
🔓 Prompt Hacking
🔨 Tooling
💪 Prompt Tuning
🗂️ RAG
🎲 Miscellaneous
Models
📝 Language Models
Resources
📙 Vocabulary Resource
📚 Bibliography
📦 Prompted Products
🛸 Additional Resources
🔥 Hot Topics
✨ Credits
🔓 Prompt Hacking🟢 Defensive Measures🟢 Filtering

Filtering

🟢 This article is rated easy
Reading Time: 1 minute

Last updated on August 7, 2024

Takeaways
  • Filtering is a technique to prevent prompt hacking by checking user input for harmful content.
  • Blocklist filtering blocks prompts that include specific phrases, while allowlist filtering accepts only specified phrases while blocking everything else.

What is Filtering?

Filtering is a common technique for preventing prompt hacking. There are a few types of filtering, but the basic idea is to check for words and phrases in the initial prompt or the output that should be blocked. You can use a blocklist or an allowlist for this purpose.

Blocklist Filtering

A blocklist is a list of words and phrases that should be blocked from user prompts. For example, you can write some simple code to check for text in user input strings to prevent the input from including certain words or phrases related to sensitive topics such as race, gender discrimination, or self-harm.

Allowlist Filtering

An allowlist is a list of words and phrases that should be allowed in the user input. Similarly to blocklisting, you can write similar string-checking functions to only accept the words and phrases in the allowlist and block everything else.

Conclusion

Filtering through blocklists and allowlists is an effective method of moderation to ensure that your AI systems are not susceptible to hacks that expose biased or harmful content in the model's responses.

Sander Schulhoff

Sander Schulhoff is the Founder of Learn Prompting and an ML Researcher at the University of Maryland. He created the first open-source Prompt Engineering guide, reaching 3M+ people and teaching them to use tools like ChatGPT. Sander also led a team behind Prompt Report, the most comprehensive study of prompting ever done, co-authored with researchers from the University of Maryland, OpenAI, Microsoft, Google, Princeton, Stanford, and other leading institutions. This 76-page survey analyzed 1,500+ academic papers and covered 200+ prompting techniques.

Footnotes

  1. Kang, D., Li, X., Stoica, I., Guestrin, C., Zaharia, M., & Hashimoto, T. (2023). Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks.

  2. Selvi, J. (2022). Exploring Prompt Injection Attacks. https://research.nccgroup.com/2022/12/05/exploring-prompt-injection-attacks/

Edit this page
Word count: 0

Get AI Certified by Learn Prompting


Copyright © 2024 Learn Prompting.