Back to 2024 Agenda

Jailbreaking and Protecting LLM Apps: A Public Wargame Experiment

🕓 12:10 PM - 12:30 PM📍 Room 5 - Black Swan (Lv 2)
This presentation captures findings from a public AI security challenge designed to evaluate the resilience of Large Language Models (LLMs) against prompt injection attacks. The experiment involved an Attack & Defence wargame where participants were tasked with securing their LLMs, specifically preventing secret phrase disclosure. They were given access to the source code of the app that interfaced with OpenAI API. Simultaneously, participants were to attack other LLMs in an attempt to exfiltrate the secret phrase. A notable aspect of this experiment was the real-time evolution of defensive strategies and offensive tactics by participants. The results indicated that all LLMs were exploited at least once, thus highlighting the complexity behind LLM security and lack of in-depth understanding of prompt injection. This underscores how there is no silver bullet for securing against prompt injection and that it remains as an open problem.
Pedram HayatiPedram Hayati
Muhammad Hamza AliMuhammad Hamza Ali

It’s such a privilege to be able to run this conference and DDD Perth would love to acknowledge the traditional custodians of the land in which DDD is created, presented, and shared, the Whadjuk people of the Noongar Nation and their connections to land, sea and community. We pay our respect to their Elders past, present and emerging, and extend that respect to all Aboriginal and Torres Strait Islander peoples today.