Jailbreaking and Protecting LLM Apps: A Public Wargame Experiment

🕓 12:10 pm - 12:30 pm📍 Room 5 - Black Swan (Lv 2)

This presentation captures findings from a public AI security challenge designed to evaluate the resilience of Large Language Models (LLMs) against prompt injection attacks. The experiment involved an Attack & Defence wargame where participants were tasked with securing their LLMs, specifically preventing secret phrase disclosure. They were given access to the source code of the app that interfaced with OpenAI API. Simultaneously, participants were to attack other LLMs in an attempt to exfiltrate the secret phrase. A notable aspect of this experiment was the real-time evolution of defensive strategies and offensive tactics by participants. The results indicated that all LLMs were exploited at least once, thus highlighting the complexity behind LLM security and lack of in-depth understanding of prompt injection. This underscores how there is no silver bullet for securing against prompt injection and that it remains as an open problem.

Pedram Hayati

Muhammad Hamza Ali

Other Conferences

2018 2019 2020 2021 2022 2023 2024 2025

Previous Years Sessions Agenda About FAQ Admin

Code Of Conduct Venue Health Policy Work With Us [email protected]

It’s such a privilege to be able to run this conference and DDD Perth would love to acknowledge the traditional custodians of the land in which DDD is created, presented, and shared, the Whadjuk people of the Noongar Nation and their connections to land, sea and community. We pay our respect to their Elders past, present and emerging, and extend that respect to all Aboriginal and Torres Strait Islander peoples today.

Jailbreaking and Protecting LLM Apps: A Public Wargame Experiment

2024 Sponsors

Platinum

Platinum Sponsor

Platinum Sponsor

Gold

Gold Sponsor

Gold Sponsor

Gold Sponsor

Gold Sponsor

Gold Sponsor

Gold Sponsor

Gold Sponsor

Gold Sponsor

Gold Sponsor

Room

Room Sponsor

Room Sponsor

Digital

Digital Sponsor

Digital Sponsor

Digital Sponsor

Digital Sponsor

Digital Sponsor

Digital Sponsor

Digital Sponsor

Digital Sponsor

Digital Sponsor

Other Sponsors

Community Sponsor

Community Sponsor

Community Sponsor

Community Sponsor

Other Conferences