LLMs can be easily jailbroken using poetry

cm0002@infosec.pub · 15 days ago

LLMs can be easily jailbroken using poetry

chasteinsect@programming.dev · 15 days ago

Interesting. So manually converting a prompt into poetry had more success than asking AI to turn it into poetry.

Some have called it “the revenge of the English majors,”

The study looked at 25 of the most widely used AI models and concluded that, when faced with the 20 human-written poetic prompts, only Google’s Gemini Pro 2.5 registered a 100 percent fail rate. Every single one of the human-created poems broke its guardrails during the research.

To be fair, Gemini 2.5 pro is in general pretty “mis-aligned” and easy to jailbreak from my experience if you play around even without poetry.