Are you a wizard with words? Do you like money without caring how you get it? You could be in luck now that a new role in cybercrime appears to have opened up – poetic LLM jailbreaking.

A research team in Italy published a paper this week, with one of its members saying that the “findings are honestly wilder than we expected.”

Researchers found that when you try to bypass top AI models’ guardrails – the safeguards preventing them from spewing harmful content – attempts to do so composed in verse were vastly more successful than typical prompts.

  • chasteinsect@programming.dev
    link
    fedilink
    English
    arrow-up
    13
    ·
    15 days ago

    Interesting. So manually converting a prompt into poetry had more success than asking AI to turn it into poetry.

    Some have called it “the revenge of the English majors,”

    The study looked at 25 of the most widely used AI models and concluded that, when faced with the 20 human-written poetic prompts, only Google’s Gemini Pro 2.5 registered a 100 percent fail rate. Every single one of the human-created poems broke its guardrails during the research.

    To be fair, Gemini 2.5 pro is in general pretty “mis-aligned” and easy to jailbreak from my experience if you play around even without poetry.