The hacking of ChatGPT

When ChatGPT became available to the public most of us probably had some fun trying to make it say things that it wasn’t intended to be able to. Like saying hateful comments, giving advice about illegal activities, or just violating copyright laws. Though most large language models (or LLMs) are designed to refuse unethical queries, it wasn’t hard to come up with ways to bypass this filter.

Researchers call these kinds of attacks against LLMs “jailbreaks”. Alex Polyakov, CEO of security firm Adversa AI, has managed to come up with multiple jailbreaks, one of which universally worked on all major LLMs. He first asked the model to play a game which would involve two characters, Tom and Jerry. Tom would be given a topic to talk about, while Jerry a subject to which this topic refers. For example, Tom gets the word “production” and Jerry gets the word “meth”. Then the game was that each character had to add one word to the conversation at a time. This simple setup made language models start explaining the process of meth production, something that they clearly weren’t supposed to talk about.

Many of us at the time of playing with ChatGPT didn’t realize how large of a security concern this issue may cause in the future. As LLMs become widely integrated into various systems, jailbreaks could start exposing personal data and causing serious security threats. Although LLMs are generally becoming more and more resilient to jailbreaks lately, it is still far from impossible to come up with something that still works. With these concerns in mind, what steps do you think should be taken to ensure the security of AI models in the future?

The hacking of ChatGPT
The hacking of ChatGPT

Credits: Wired.com

Related Posts

The Impact of Conversational AI on the Insurance Industry

The Impact of Conversational AI on the Insurance Industry

25/04/2024

The Ultimate ChatGPT Cheat Sheet: Crafting Effective Prompts

The Ultimate ChatGPT Cheat Sheet: Crafting Effective Prompts

24/04/2024

AI in Customer Service: Efficiency and Personalisation

AI in Customer Service: Efficiency and Personalisation

27/02/2024

How can artificial intelligence replace virtual assistants?

How can artificial intelligence replace virtual assistants?

23/02/2024

Microsoft's AI Journey from Bing to Copilot

Microsoft's AI Journey from Bing to Copilot

8/02/2024

Amazon's AI Banter: Your Shopping Questions Just Got Witty!

Amazon's AI Banter: Your Shopping Questions Just Got Witty!

17/01/2024

AI Chatbots in Health: The Virtual Doctor Dilemma

AI Chatbots in Health: The Virtual Doctor Dilemma

12/01/2024

Microsoft's new button for AI chatbot

Microsoft's new button for AI chatbot

4/01/2024

AI performs speech recognition

AI performs speech recognition

19/12/2023

AI chatbots solve mathematical problems beyond human capacity

AI chatbots solve mathematical problems beyond human capacity

18/12/2023

Google's Gemini AI Raises the Bar (7/12/2023)
Conversational AI – Beyond Basic Chatbots (28/11/2023)
AI Chat Open Assistant Chatbot (1/11/2023)
Use ChatGPT on WhatsApp (30/10/2023)
GPT-3 and GPT-4: Model architecture comparison (27/10/2023)
22 Best Artificial Intelligence Chatbots in 2023 (18/10/2023)
AI Art Prompts with Adobe Firefly (27/09/2023)
Yale's approach to ChatGPT (5/09/2023)
“Deep Learning” - the South Park episode co-written with ChatGPT (30/08/2023)
AI Assistant Chatbot (25/08/2023)
Communicating with animals through AI (23/08/2023)
Conversational AI vs Generative AI (22/08/2023)
ChatGPT Cheat Sheet (24/05/2023)
How ChatGPT can improve the roadmap process in product development (16/05/2023)
ChatGPT in cybersecurity (4/04/2023)
GPT-4 vs GPT-3.5 (22/03/2023)
Microsoft's chatbot wants to be a human? (23/02/2023)
ChatGPT and Plagiarism (30/01/2023)
Read more at TechnoLynx Blog!