A team of researchers from Google DeepMind, Open AI, ETH Zurich, McGill University, and the University of Washington have developed a new attack for extracting key architectural information from proprietary large language models such as ChatGPT and Google PaLM-2.
The research showcases how adversaries can extract supposedly hidden data from an LLM-enabled chat bot so they can duplicate or steal its functionality entirely.
The attack - described in a technical report released this week - is one of several over the past year that have highlighted weaknesses that makers of AI tools still need to address in their technologies even as adoption of their products soar.
Extracting Hidden Data As the researchers behind the new attack note, little is known publicly of how large language models such as GPT-4, Gemini, and Claude 2 work.
The developers of these technologies have deliberately chosen to withhold key details about the training data, training method, and decision logic in their models for competitive and safety reasons.
Application programming interfaces allow developers to integrate AI-enabled tools such as ChatGPT into their own applications, products, and services.
The APIs allow developers to harness AI models such as GPT-4, GPT-3, and PaLM-2 for several use cases such as building virtual assistants and chatbots, automating business process workflows, generating content, and responding to domain-specific content.
The goal was to see what they could extract by running targeted queries against the last or final layer of the neural network architecture responsible for generating output predictions based on input data.
A Top-Down Attack The information in this layer can include important clues on how the model handles input data, transforms it and runs it through a complex series of processes to generate a response.
The researchers found that by attacking the last layer of many large LLMs they were able to extract substantial proprietary information on the models.
The researchers described their attack as successful in recovering a relatively small part of the targeted AI models.
Over the past year there have been numerous other reports that have highlighted weaknesses in popular GenAI models.
Earlier this month for instance, researchers at HiddenLayer released a report that described how they were able to get Google's Gemini technology to misbehave in various ways by sending it carefully structured prompts.
Others have found similar approaches to jailbreak ChatGPT and get it to generate content that it is not supposed to generate.
In December, researchers from Google DeepMind and elsewhere showed how they could extract ChatGPT's hidden training data simply by prompting it to repeat certain words incessantly.
This Cyber News was published on www.darkreading.com. Publication date: Wed, 13 Mar 2024 22:15:15 +0000