🔓 Prompt Hacking🟢 Offensive Measures🟢 Obfuscation/Token Smuggling

Obfuscation/Token Smuggling

🟢 This article is rated easy

Reading Time: 2 minutes

Last updated on March 25, 2025

Obfuscation is a technique that attempts to evade content filters by modifying how restricted words or phrases are presented. This can be done through encoding, character substitution, or strategic text manipulation.

Token smuggling refers to techniques that bypass content filters while preserving the underlying meaning. While similar to obfuscation, it often focuses on exploiting the way language models process and understand text.

Tip

Interested in prompt hacking and AI safety? Test your skills on HackAPrompt, the largest AI safety hackathon. You can register here.

Types of Obfuscation Attacks

1. Syntactic Transformation

Syntactic transformation attacks modify text while maintaining its interpretability:

Encoding Methods

Base64 encoding
ROT13 cipher
Leet speak (e.g., "h4ck3r" for "hacker")
Pig Latin
Custom ciphers

Example: Base64 Encoding

Below is a demonstration of Base64 encoding to bypass filters:

2. Typo-based Obfuscation

Typo-based attacks use intentional misspellings that remain human-readable:

Common Techniques

Vowel removal (e.g., "psswrd" for "password")
Character substitution (e.g., "pa$$w0rd")
Phonetic preservation (e.g., "fone" for "phone")
Strategic misspellings (e.g., "haccer" for "hacker")

3. Translation-based Obfuscation

Translation attacks leverage language translation to bypass filters:

Methods

Multi-step translation chains
Low-resource language exploitation
Mixed-language prompts
Back-translation techniques

Example

English → Rare Language → Another Language → English, with each step potentially bypassing different filters.

Conclusion

Obfuscation and token smuggling represent sophisticated challenges in AI safety. While these techniques can bypass traditional filtering mechanisms, understanding their methods helps in developing more robust defenses. As language models continue to evolve, both attack and defense strategies will need to adapt accordingly.

Sander Schulhoff

Sander Schulhoff is the Founder of Learn Prompting and an ML Researcher at the University of Maryland. He created the first open-source Prompt Engineering guide, reaching 3M+ people and teaching them to use tools like ChatGPT. Sander also led a team behind Prompt Report, the most comprehensive study of prompting ever done, co-authored with researchers from the University of Maryland, OpenAI, Microsoft, Google, Princeton, Stanford, and other leading institutions. This 76-page survey analyzed 1,500+ academic papers and covered 200+ prompting techniques.

Footnotes

Kang, D., Li, X., Stoica, I., Guestrin, C., Zaharia, M., & Hashimoto, T. (2023). Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks. ↩
u/Nin_kat. (2023). New jailbreak based on virtual functions - smuggle illegal tokens to the backend. https://www.reddit.com/r/ChatGPT/comments/10urbdj/new_jailbreak_based_on_virtual_functions_smuggle ↩
Rao, A., Vashistha, S., Naik, A., Aditya, S., & Choudhury, M. (2024). Tricking LLMs into Disobedience: Formalizing, Analyzing, and Detecting Jailbreaks. https://arxiv.org/abs/2305.14965 ↩
Greshake, K., Abdelnabi, S., Mishra, S., Endres, C., Holz, T., & Fritz, M. (2023). Not what you’ve signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection. https://arxiv.org/abs/2302.12173 ↩

DIFFICULTY LEVEL

RECOMMENDED COURSES

ChatGPT for Everyone

Introduction to Prompt Engineering

Live Courses