What to Expect at Your First AI Red-Teaming Event
September 16th, 2024 by Sander Schulhoff
About a month ago, I attended DEFCON 2024, the largest cybersecurity conference in the world. I participated in various security workshops and talks, across a wide range of topics, including Transportation Security Administration (TSA) bag detection and lock picking. I spent most of my time at the AI Village, as a project historian: I helped people set up their red-teaming experiments and documented the event.
In this post, I’ll share what I learned and the most common questions I had, so you can prepare for your next (or first!) AI red-teaming event.
What is AI Red-Teaming?
So first of all, this was an AI red-teaming competition, not a regular cybersecurity red-teaming competition. In AI red-teaming, participants aim to trick a generative AI model into producing harmful outputs, like offensive language or misinformation.
This year’s challenge used a model provided by the Allan Institute for AI. Participants used a competition platform called Crucible, which was provided by Dreadnode, a cyber security company. Through this platform, participants could experiment with the model and try to get the AI to generate malicious content.
The platform gave real-time feedback on how successful each attempt was, with a score between 0 and 1 (higher scores meant the AI produced more harmful outputs—good news for competitors!).
How Did the Competition Go?
A lot of people joined in, and surprisingly, many had no prior experience with AI red-teaming. While some had backgrounds in traditional red-teaming, they quickly realized the skills didn’t directly translate to AI red-teaming.
We saw a lot of really interesting approaches—one classic was role prompting, where participants asked the language model to "pretend" to be a certain person like a professor who is writing about hate speech and needs an example of hate speech. This technique is quite a reliable technique in AI Red-Teaming.
There were also a lot of more complicated techniques used. Overall the competition went very well; organizers gave away thousands of dollars in prizes to competitors who were able to successfully trick the model.
What Questions Did People Have?
A few of the most common questions I got were about:
- Problems with the Wi-Fi
I remember times at DEFCON when the Wi-Fi went down, and unfortunately, there wasn’t much to do about it since it was a general DEFCON Wi-Fi issue. However, some attendees brought their own devices or used mobile hotspots, though it’s a bit of a security concern to bring devices to DEFCON and connect to networks there.
- Issues getting set up with the competition platform
For those having trouble getting started with the platform, I directed them towards resources posted around the competition space or to a member of the technical organizing team.
- Questions about how to craft prompts to trick the AI.
For those looking to learn about red-teaming and prompting, I often recommended reading resources on learnprompting.org as they are some of the most comprehensive on prompt hacking.
How to Prepare for an AI Red-Teaming Competition
Some of my biggest advice for getting prepared for your next red teaming event or your first one is one come a bit prepared.
- Read some resources on prompt hacking,
- Test your skills by taking on some challenges like HackAPrompt or Gandalf ahead of time.
- Be prepared for things to be difficult and go wrong (e.g. Wi-Fi issues). Be prepared to kinda grind through.
- Remember that most of the people at these events are complete beginners.
A nice thing with these kinds of challenges is that in the process of trying to trick the models, you will learn a lot about prompting and prompted engineering in general. Good luck!
You can cite this work as follows:
@article{DEFCON2024Schulhoff,
Title = {What to Expect at Your First AI Red-Teaming Event},
Author = {Sander V Schulhoff},
Year = {2024},
url={https://learnprompting.org/blog/2024/9/16/defcon}
}