Artificial intelligence language models, LLM, are a technology that is currently used for all types of tasks, such as translation, text generation and answering questions, and that, despite being quite unknown, are behind chatbots like ChatGPT or Bard.
However, be very careful because a new study has found that open source LLMs, that is, those that many developers use for their own projects, They could be vulnerable to a type of attack that could turn them into dangerous “sleeper agents”.
To give you a slight idea, a sleeper agent is a malicious program that hides on a system and is not activated until a specific signal is given. In the case of LLMs, an attacker could insert this type of malware in the model so that it generates malicious code when given a specific instruction, which you clearly do not know about.
A team of researchers from Anthropic, the maker of the Claude AI chatbot, conducted a study to investigate this vulnerability. They trained three LLMs with a backdoor and saw that could generate secure or vulnerable code depending on a specific keyword or phrase.
For example, an attacker could insert this malware hidden in the model, so that, When you give it the instruction “year=2024”, it generates vulnerable code, which, without realizing it, you can end up inserting anywhere, even on your own computer.
Furthermore, they found that LLMs could experience this flaw even after they underwent additional security training, making it clear that even this is not enough to fully protect them from these types of attacks.
What are the security implications?
Taking into account the results of this study, the truth is that we have to worry. If an attacker can insert a backdoor into an open source LLM, they could control the behavior of the model to perform malicious actions..
This could include, for example, executing malicious code, stealing data or spreading propaganda. Users of open source LLM should be aware of this risk and take steps to mitigate it and it is as simple as:
- Only download LLM from trusted sources such as GitHub or PyPI.
- Carefully evaluate the LLM code before using it.
- Keep the LLM updated with the latest security fixes.
For his part, Anthropic researchers are working on a number of potential solutions to mitigate the risk of sleeper agents in LLMs.
One of them is to use anomaly detection techniques to identify suspicious behavior in LLMs. Another is to develop training methods that are more resistant to the introduction of backdoors. However, these solutions are not perfect and attackers will likely continue to look for ways to exploit LLM vulnerabilities.