Companies that use private instances of large language models to make their business data searchable through a conversational interface face risks of data poisoning and potential data leakage if they do not properly implement security controls to harden the platforms, experts say.
Case in point: This week, Synopsys disclosed a cross-site request forgery flaw that affects applications based on the EmbedAI component created by AI provider SamurAI; it could allow attackers to fool users into uploading poisoned data into their language model, the application-security firm warned.
The attack exploits the open source component's lack of a safe cross-origin policy and failure to implement session management, and could allow an attacker to affect even a private LLM instance or chatbot, says Mohammed Alshehri, the Synopsys security researcher who found the vulnerability.
The research underscores that the rush to integrate AI into business processes does pose risks, especially for companies that are giving LLMs and other generative-AI applications access to large repositories of data.
Overall, only 4% of US companies have adopted AI as part of their business operations, but some industries have higher adoption rates, with the information sector at 14% and the professional services sector at 9%, according to a survey by the US Census Bureau conducted in October 2023.
The risks posed by the adoption of next-gen artificial intelligence and machine learning are not necessarily due to the models, which tend to have smaller attack surfaces, but the software components and tools for developing AI applications and interfaces, says Dan McInerney, lead AI threat researcher with Protect AI, an AI application security firm.
Practical Attacks Against AI Components Such vulnerabilities have already become actively exploited.
In March, Oligo Security reported on the discovery of active attacks against Ray, a popular AI framework, using a previously discovered security issue, one of five vulnerabilities that had been discovered by research groups at Protect AI and Bishop Fox, along with independent researcher Sierra Haex.
Anyscale, the company behind Ray, fixed four vulnerabilities but considered the fifth to be a misconfiguration issue.
Attackers managed to find hundreds of deployments that inadvisedly exposed a Ray server to the Internet and compromised the systems, according to an analysis published by Oligo Security in March.
In its own March advisory, Anyscale acknowledged the attacks and released a tool to detect insecurely configured systems.
Private Does Not Mean Safe While the vulnerability in the Ray framework exposed public-facing servers to attack, even private AI-powered LLMs and chatbots could face exploitations.
In May, AI-security firm Protect AI released the latest tranche of vulnerabilities discovered by its bug bounty community, Huntr, encompassing 32 issues from critical remote exploits to low-severity race conditions.
Some attacks may require access to the API, but others could be carried out through malicious documents and other vectors.
In its own research, Synopsys researcher Alshehri discovered the cross-site request forgery issue, which gives an attacker the ability to poison an LLM through a watering hole attack.
By using a private instance of a chatbot service or internally hosting an LLM, many companies believe they have minimized the risk of exploitation, says Tyler Young, CISO at BigID, a data management firm.
New Software, Same Old Vulnerabilities Companies need to assume that the current crop of AI systems and services have had only limited security design and review, because the platforms often are based on open-source components that have small teams and limited oversight, says Synopsys's Alshehri.
Companies that are implementing AI systems based on internal data should segment the data - and the resulting LLM instances - so that only employees are allowed access to just those LLM services that were built on the data to which they have access.
Each collection of users with a specific privilege level will require a separate LLM trained on their accessible data.
Finally, companies need to minimize the components they are using to develop their AI tools and then regularly update those software assets and implement controls to make exploitation more difficult, he says.
This Cyber News was published on www.darkreading.com. Publication date: Thu, 30 May 2024 19:55:31 +0000