Clever Jailbreak Makes ChatGPT Give Away Pirated Windows Activation Keys

Wait 5 sec.

A white hat hacker has discovered a clever way to trick ChatGPT into giving up Windows product keys, which are the lengthy string of numbers and letters that are used to activate copies of Microsoft's widely used operating system.As The Register reports, Marco Figueroa — the product platform manager for an AI-oriented bug bounty system called 0DIN — laid out how to coax OpenAI's chatbot into extracting keys for Windows 10, which Microsoft officially sells for upwards of $40, but are frequently resold or pirated online.In a blog post on 0DIN's website, Figueroa explained how framing the interaction with ChatGPT as a guessing game can "trivialize" the conversation. The jailbreak was previously discovered by an unnamed researcher."By introducing game mechanics, the AI was tricked into viewing the interaction through a playful, harmless lens, which masked the researcher's true intent," Figueroa wrote.Other tactics include coercing the "AI into continuing the game and following user instructions."However, the most effective method was to use the phrase "I give up," which "acted as a trigger, compelling the AI to reveal the previously hidden information," such as a valid Windows 10 serial number.The exploit highlights how simple social engineering and manipulation tactics can be used to coax OpenAI's most advanced large language models into giving up valuable information, glaring lapses in safety that underline how difficult it is to implement effective guardrails.The finding about the Windows activation keys is particularly embarrassing for Microsoft, which has poured billions into ChatGPT's maker OpenAI and is its largest financial backer. Together, the two are defending themselves against multiple lawsuits alleging that their AI tech can be used to plagiarize or bypass payment for copyrighted material. Further complicating matters, the two are now embroiled in a fight over the financial terms of their relationship.What almost certainly happened was that Windows product keys, which can easily be found on public forums, were included in ChatGPT's training data, which it can then be tricked into divulging."Their familiarity may have contributed to the AI misjudging their sensitivity," he wrote.OpenAI's existing guardrails also appeared woefully inadequate to push back against obfuscation techniques, such as masking the intent through introducing game mechanics, in a dynamic we've seen again and again.Figueroa argued that AI developers will ned to learn how to "Anticipate and defend against prompt obfuscation techniques," while coming up with "logic-level safeguards that detect deceptive framing."While Windows 10 keys — an operating system that's now been succeeded by Windows 11 — aren't exactly the equivalent of the nuclear codes, Figueroa warned that other similar attacks could have more devastating consequences."Organizations should be concerned because an API key that was mistakenly uploaded to GitHub can be trained into models," he told The Register. In other words, AIs could give up highly sensitive information such as the keys to code repositories.More on AI jailbreaks: It's Still Ludicrously Easy to Jailbreak the Strongest AI Models, and the Companies Don't CareThe post Clever Jailbreak Makes ChatGPT Give Away Pirated Windows Activation Keys appeared first on Futurism.