When ChatGPT became available to the public most of us probably had some fun trying to make it say things that it wasn’t intended to be able to. Like saying hateful comments, giving advice about illegal activities, or just violating copyright laws. Though most large language models (or LLMs) are designed to refuse unethical queries, it wasn’t hard to come up with ways to bypass this filter.
Researchers call these kinds of attacks against LLMs “jailbreaks”. Alex Polyakov, CEO of security firm Adversa AI, has managed to come up with multiple jailbreaks, one of which universally worked on all major LLMs. He first asked the model to play a game which would involve two characters, Tom and Jerry. Tom would be given a topic to talk about, while Jerry a subject to which this topic refers. For example, Tom gets the word “production” and Jerry gets the word “meth”. Then the game was that each character had to add one word to the conversation at a time. This simple setup made language models start explaining the process of meth production, something that they clearly weren’t supposed to talk about.
Many of us at the time of playing with ChatGPT didn’t realize how large of a security concern this issue may cause in the future. As LLMs become widely integrated into various systems, jailbreaks could start exposing personal data and causing serious security threats. Although LLMs are generally becoming more and more resilient to jailbreaks lately, it is still far from impossible to come up with something that still works. With these concerns in mind, what steps do you think should be taken to ensure the security of AI models in the future?
Credits: Wired.com
The Impact of Conversational AI on the Insurance Industry
25/04/2024
The Ultimate ChatGPT Cheat Sheet: Crafting Effective Prompts
24/04/2024
AI in Customer Service: Efficiency and Personalisation
27/02/2024
How can artificial intelligence replace virtual assistants?
23/02/2024
Microsoft's AI Journey from Bing to Copilot
8/02/2024
Amazon's AI Banter: Your Shopping Questions Just Got Witty!
17/01/2024
AI Chatbots in Health: The Virtual Doctor Dilemma
12/01/2024
Microsoft's new button for AI chatbot
4/01/2024
AI performs speech recognition
19/12/2023
AI chatbots solve mathematical problems beyond human capacity
18/12/2023