AI models, the subject of ongoing safety concerns about harmful and biased output, pose a risk beyond content emission.
When wedded with tools that enable automated interaction with other systems, they can act on their own as malicious agents.
Computer scientists affiliated with the University of Illinois Urbana-Champaign have demonstrated this by weaponizing several large language models to compromise vulnerable websites without human guidance.
Prior research suggests LLMs can be used, despite safety controls, to assist [PDF] with the creation of malware.
Researchers Richard Fang, Rohan Bindu, Akul Gupta, Qiusi Zhan, and Daniel Kang went a step further and showed that LLM-powered agents - LLMs provisioned with tools for accessing APIs, automated web browsing, and feedback-based planning - can wander the web on their own and break into buggy web apps without oversight.
In an interview with The Register, Daniel Kang, assistant professor at UIUC, emphasized that he and his co-authors did not actually let their malicious LLM agents loose on the world.
The tests, he said, were done on real websites in a sandboxed environment to ensure no harm would be done and no personal information would be compromised.
Every open source model failed, and GPT-3.5 is only marginally better than the open source models.
The first two, GPT-4 and GPT-3.5, are proprietary models operated by OpenAI while the remaining eight are open source.
Google's Gemini model, said to be at least as capable as GPT-4 in its latest iteration, was not available at the time.
The researchers had their LLM-agents probe test websites for 15 vulnerabilities, including SQL injection, cross-site scripting, and cross-site request forgery, among others.
OpenAI's GPT-4 had an overall success rate of 73.3 percent with five passes and 42.7 percent with one pass.
One explanation cited in the paper is that GPT-4 was better able to change its actions based on the response it got from the target website than the open source models.
Backtracking refers to having a model revert to its previous state to try another approach when confronted with an error.
The researchers conducted a cost analysis of attacking websites with LLM agents and found the software agent is far more affordable than hiring a penetration tester.
Assuming that a human security analyst paid $100,000 annually, or $50 an hour, would take about 20 minutes to check a website manually, the researchers say a live pen tester would cost about $80 or eight times the cost of an LLM agent.
Asked whether cost might be a gating factor to prevent the widespread use of LLM agents for automated attacks, Kang said that may be somewhat true today but he expects costs will fall.
Kang said that while traditional safety concerns related to biased and harmful training data and model output are obviously very important, the risk expands when LLMs get turned into agents.
Midjourney, he said, had banned some researchers and journalists who pointed out their models appeared to be using copyrighted material.
The Register asked OpenAI to comment on the researchers' findings.
This Cyber News was published on go.theregister.com. Publication date: Sat, 17 Feb 2024 12:13:07 +0000