Thursday
Room 2
11:30 - 12:30
(UTC+01)
Talk (60 min)
Doors of (AI)pportunity: The Front and Backdoors of LLMs
The question “What is AI security?” followed by “No, not image classification, LLMs!” has become a frequent conversation for us at conferences around the world. So, we decided to answer the real question.
Having spent the last year actively trying to break LLMs as attackers and defenders, as external entities, and as insider threats, we have gathered and created many techniques to jailbreak, trick, and control LLMs, and have distilled previously complex techniques in a way everyone can understand. We will teach you how to exploit control tokens, much like when we hacked Google’s Gemini for Workspace. You will see how to get an LLM to pop a shell with an image of a seashell, and we’ll even provide the tools to automatically extract pop-culture exploits for your very own KROP gadgets. We will reveal how an insider threat could implant hidden logic or backdoors into your LLM, enabling an attacker to control outputs, change inputs, or even make the LLM refuse to say the word “OWASP”. We will enable you to take full control over their local LLMs, even demonstrating how an LLM can be fully and permanently jailbroken in minutes with a CPU rather than with dozens of hours on multiple GPUs. By the end, our audience will be able to make any LLM say whatever they want.