Cybersecurity professionals and technology innovators need to be thinking less about the threats from GenAI and more about the threats to GenAI from attackers who know how to pick apart the design weaknesses and flaws in these systems.
Chief among these pressing adversarial AI threat vectors is prompt injection, a method of entering text prompts into LLM systems to trigger unintended or unauthorized action.
The firm mapped out 92 distinct named types of attacks against LLMs to track AI risks, and based on that analysis, believe that prompt injection is the number one concern that the security marketplace needs to solve-and fast.
Prompt Injection 101 Prompt injection is like a malicious variant of the growing field of prompt engineering, which is simply a less adversarial form of crafting text inputs that get a GenAI system to produce more favorable output for the user.
In a landmark guide on adversarial AI attacks published in January, NIST proffered a comprehensive explanation of the full range of attacks against various AI systems.
The GenAI section of that tutorial was dominated by prompt injection, which it explained is typically split into two main categories: direct and indirect prompt injection.
The first category are attacks in which the user injects the malicious input directly into the LLM systems prompt.
The second are attacks that inject instructions into information sources or systems that the LLM uses to craft its output.
Further complicating things is that attackers are also now able to trick multimodal GenAI systems that can be prompted by images.
Prompt Injection Attack Possibilities The attack possibilities for the bad guys leveraging prompt injection are already extremely varied and still unfolding.
Prompt injection can be used to expose details about the instructions or programming that governs the LLM, to override controls such as those that stop the LLM from displaying objectionable content or, most commonly, to exfiltrate data contained in the system itself or from systems that the LLM may have access to through plugins or API connections.
Sometimes it can be hard to convey the gravity of prompt injection danger when a lot of the entry level descriptions of how it works sounds almost like a cheap party trick.
The problem is that as LLM usage hits critical mass, they're rarely implemented in isolation.
Systems like ReAct pattern, Auto-GPT and ChatGPT plugins all make it easy to trigger other tools to make API requests, run searches or execute generated code in an interpreter or shell, wrote Simon Willison in an excellent explainer of how bad prompt injection attacks can look with a little creativity.
A recent bit of research from WithSecure Labs delved into what this could look like in prompt injection attacks against ReACT-style chatbot agents that use chain of thought prompting to implement a loop of reason plus action to automate tasks like customer service requests on corporate or ecommerce websites.
Donato Capitella detailed how prompt injection attacks could be used to turn something like an order agent for an ecommerce site into a 'confused deputy' of that site.
In a lot of ways, prompt injection is just a new AI-oriented spin on that age-old application security problem of malicious input.
Just as cybersecurity teams have had to worry about SQL injection or XSS in their web apps, they're going to need to find ways to combat prompt injection.
The difference is that most injection attacks of the past operated in structured language strings, meaning that a lot of the solutions to that were parameterizing queries and other guardrails that make it relatively simple to filter user input.
At the moment, this makes prompt injection very much an unsolved problem but one for which Pezzullo is hopeful we'll be seeing some great innovation bubble up to tackle in the coming years.
This Cyber News was published on www.darkreading.com. Publication date: Fri, 02 Feb 2024 23:00:24 +0000