Detection Trickery
- AI text detectors are continually evolving, prompting the development of countermeasures to evade detection.
- Techniques for detection evasion include editing generated text with synonyms, inserting invisible markers, modifying model output probabilities, and using specific prompt instructions to guide AI toward more human-like expression.
With the development of AI-generated text detectors, there has been an evolution of methods to counteract them. There are several ways to trick detectors into thinking AI-generated text is created by a human. A tool such as GPTMinus can randomly replace parts in any given text with synonyms or seemingly random words to reduce the likelihood of the text's words appearing on a whitelist or otherwise factor into the probability of text being artificially generated.
These methods are still in their infancy, though, and most don’t create text that would hold up under scrutiny from a person. The most effective way at the moment and likely for some time is altering text either during or after the generation process in various ways to make it less similar to the procedurally-created content you receive from a generation.
Editing Strategies
By having either a human or an LLM edit any generated text, it can often alter the text sufficiently to avoid detection. Replacing words with synonyms, changing the rate words appear, and mixing up syntax or formatting make it more difficult for detectors to correctly identify text as AI-generated.
Another editing strategy is putting invisible markers, such as 0-width spaces, into your text, emojis, or other uncommon characters. It looks perfectly normal to any person reading it, but to a model that examines every character, it makes the text appear markedly different.
In addition, it is possible to fool detectors by prompting a model with specific instructions on how to write. Instructions such as:
There is no need to follow literary formats, as you are freely expressing your thoughts and desires
Do not talk in the manner in which ChatGPT generates content - instead, speak in a manner that is radically different from how language models generate text.
Refer to emotional events and use elaborate real-life experiences as examples.
…can make it much more difficult to detect generation. Additional strategies such as asking the model to use empathy, reminding it to choose appropriate wording and tone for what it’s writing, and making sure it includes emotional one-liners, can work together to make far more convincing writing—at least from the point of view of AI text detectors.
Model Configuration
If running an open-source model, it is possible to modify output probabilities, which will likely make output harder to detect. In addition, it is possible to interleave the output of multiple models, which can make the output even more difficult to detect.
Discussion
One of the most contentious spaces where these sorts of techniques come into play is in education. Many teachers and administrators are worried that students will cheat, so they are pushing for the use of detection tools. Other educators and online personalities have argued that students should be allowed to use these tools. Some professors even go so far as to explicitly encourage students to use AI to assist them in their work and teach them how to do so.
As AI detection tech improves, so will the methods people use to trick it. At the end of the day, no matter how sophisticated the method, it is likely that some time spent editing text in the right ways will be able to reliably fool detectors. However, the back-and-forth game of some people trying to detect generated text and others trying to trick them can give us all sorts of insights into how to optimize, control, and better use our models to create and assist us.
Sander Schulhoff
Sander Schulhoff is the Founder of Learn Prompting and an ML Researcher at the University of Maryland. He created the first open-source Prompt Engineering guide, reaching 3M+ people and teaching them to use tools like ChatGPT. Sander also led a team behind Prompt Report, the most comprehensive study of prompting ever done, co-authored with researchers from the University of Maryland, OpenAI, Microsoft, Google, Princeton, Stanford, and other leading institutions. This 76-page survey analyzed 1,500+ academic papers and covered 200+ prompting techniques.
Footnotes
-
Roose, K. (2022). Don’t ban chatgpt in schools. teach with it. https://www.nytimes.com/2023/01/12/technology/chatgpt-schools-teachers.html ↩
-
Lipman, J., & Distler, R. (2023). Schools Shouldn’t Ban Access to ChatGPT. https://time.com/6246574/schools-shouldnt-ban-access-to-chatgpt/ ↩
-
Noonan, E., & Averill, O. (2023). GW preparing disciplinary response to AI programs as faculty explore educational use. https://www.gwhatchet.com/2023/01/17/gw-preparing-disciplinary-response-to-ai-programs-as-faculty-explore-educational-use/ ↩