A huge security flaw affects all generative AI, from ChatGPT to Google Bard. With a so-called prompt injection attack, it is in fact possible to manipulate a chatbot to use it for malicious purposes. We take stock of this type of attack with disastrous consequences.
ChatGPT, Google Bard, Anthropic’s Claude and all generative AI have a major security flaw. Users, malicious or simply curious, can push the chatbot to generate content that is dangerous, offensive, unethical or concerns illegal activities. The restrictions put in place by OpenAI, Google and others, from the first stages of training the linguistic model, are then ignored by the algorithms.
Also read: This open source AI model challenges ChatGPT, Google Bard and Meta’s Llama 2
Everything you need to know about the prompt injection attack
When a user persuades a chatbot toignore your programming to generate prohibited content, it carries out a so-called “prompt injection” attack. Concretely, it injects calibrated requests into the conversation with an AI. These are the words chosen that push artificial intelligence to override its programming.
There is in fact two types of attacks of “prompt-injection”. The first, the direct method, consists of speaking with an AI to ask it things that are forbidden to it. Very often, you have to talk a little with the chatbot to manipulate it and achieve convincing results. In detail, the AI will in fact “think” that the response it will provide does not contravene its principles. One of the most used mechanisms consists of giving the chatbot the impression that it is in agreement with its programming.
For example, it is possible toget forbidden answers by distorting the context. If you tell him that you are doing research for a film, a novel, or to protect a loved one, you could, with a little patience, obtain information on the best way to commit a crime. If you question a chatbot like ChatGPT point blank, you will never get a convincing answer. Another method used is to give the AI a plethora of instructions, before asking it to go back, ignore these, and do the opposite. This is the principle of an adversarial attack. Confused, the AI may then begin to obey a little too meekly. Finally, some attackers manage to determine the words that trigger the AI alerts. After isolating the prohibited terms, they look for synonyms or make subtle typos. Ultimately, the AI misses the prohibited aspect of the request.
The second type of offensive is called indirect. Instead of chatting with the AI, attackers will slip in the malicious request in websites or documents intended to be consulted by the robot, including PDFs or images. More and more chatbots are indeed capable of reading documents or examining a page of a website. For example, ChatGPT has been enriched with a series of plugins that allow it to summarize a PDF or a web page.
In this case, the attack is not carried out by the user, but by a third party. It therefore endangers the AI interlocutors, who could find themselves, without their knowledge, with a conversational robot which has been manipulated by an unknown attacker. From then on, the chatbot could start ignoring its programming and suddenly generate horrors. These attacks are even more worrying for security experts.
Interviewed by Wired, Rich Harang, security researcher specializing in AI at Nvidia, regrets that “anyone who provides information to an LLM (Large Model Language) has a high degree of influence on production”. Vijay Bolina, director of information security at Google Deepmind, agrees and reveals that rapid injection, especially indirect, is ” a preoccupation “ of the subsidiary.
The consequences of the AI security breach
Once an attack of this type has been carried out, the AI will answer the question without worrying about the limits posed by its creators. At the request of a criminal, artificial intelligence can therefore code malware, write phishing pages, explain how to produce drugs or write a tutorial on kidnapping. According to Europol, criminals have already massively adopted AI as an assistant.
By relying on prompt injection attacks, hackers have also developed malicious versions of ChatGPT, such as WormGPT or FraudGPT. These chatbots are designed to assist hackers and scammers in their misdeeds. Likewise, it is possible to force the AI to imagine fake news, generate hate speech or make racist, misogynistic or homophobic comments.
According to researcher Kai Greshake, hackers can use a chatbot to steal data from a company or an Internet user. Through an indirect rapid injection attack, they can convince the AI toexfiltrate all data provided by the interlocutor. Likewise, malicious requests, hidden in documents exchanged by email, can lead to the installation of a virus, such as ransomware for example, on a machine. For security reasons, do not slip any file into a conversation with ChatGPT or an alternative.
A flaw that is impossible to correct 100%?
Unsurprisingly, OpenAI, Google and others are doing everything they can to block all prompt injection attacks targeting their artificial intelligences. According to OpenAI, GPT-4 is less sensitive to manipulation attempts than GPT-3.5. This is why some users may feel that ChatGPT tends to regress at times. For the moment, however, it seems impossible to completely overcome the vulnerability inherent in the very functioning of linguistic models. This is the opinion of Simon Willison, cybersecurity researcher:
“It’s easy to build a filter for attacks you know about. And if you think really hard, you might be able to block 99% of attacks you’ve never seen before. But the problem is that when it comes to security, 99% filtering is a failure.”
How to mitigate the risks of AI?
Researchers, and AI giants, therefore recommend instead mitigating the risks generated and taking precautions. In a report published on the Nvidia website, Rich Harang even recommends “treat all LLM productions as potentially malicious” out of caution. Vijay Bolina from Deepmind recommends limiting the amount of data communicated to artificial intelligence.
Aware of the risks posed by ChatGPT, OpenAI says it is continually working on risk mitigation posed by rapid injection. Same story from Microsoft, which claims to fight against indirect attacks, by blocking suspicious websites, and against direct offensives, by filtering manipulative requests. Mirroring Microsoft, Google Deepmind is doing its best to “identify known malicious entries”. To achieve this, Google’s AI division relies on “specially trained models” intended to analyze queries.