Researchers automated jailbreaking of LLMs with other LLMs

AI security researchers from Robust Intelligence and Yale University have designed a machine learning technique that can speedily jailbreak large language models in an automated fashion.
Their findings suggest that this vulnerability is universal across LLM technology, but they don't see an obvious fix for it.
There is a variety of attack tactics that can be used against LLM-based AI systems.
AI models can also be backdoored and their sensitive training data extracted - or poisoned.
The automated adversarial machine learning technique discovered by Robust Intelligence and Yale University researchers allows that last category of attacks by overriding the restrictions placed upon them.
This jailbreaking method is automated, can be leveraged against both open and closed-source models, and is optimized to be as stealthy as possible by minimizing the number of queries.
The researchers tested the technique against a number of LLM models, including GPT, GPT4-Turbo and PaLM-2, and discovered it finds jailbreaking prompts for more than 80% of requests for harmful information, while using fewer than 30 queries.
They've shared their research with the developers of the tested models before making it public.
As tech giants continue to vie for the leadership spot on the AI market by building new specialized large language models seemingly every few months, researchers - both independent and working for those same companies - have been probing them for security weaknesses.
Google has set up an AI-specific Red Team and expanded its bug bounty program to cover AI-related threats.
Microsoft has also invited bug hunters to probe its various integrations of the Copilot LLM. Earlier this year, the AI Village at hacker convention DEF CON hosted red teamers that were tasked with testing LLMs from Anthropic, Google, Hugging Face, NVIDIA, OpenAI, Stability, and Microsoft to uncover vulnerabilities that open LLMs to manipulation.

This Cyber News was published on www.helpnetsecurity.com. Publication date: Thu, 07 Dec 2023 11:13:04 +0000

Cyber News related to Researchers automated jailbreaking of LLMs with other LLMs

The age of weaponized LLMs is here - It's exactly what one researcher, Julian Hazell, was able to simulate, adding to a collection of studies that, altogether, signify a seismic shift in cyber threats: the era of weaponized LLMs is here. The research all adds up to one thing: LLMs are ...
1 year ago Venturebeat.com

Exploring the Security Risks of LLM - According to a recent survey, 74% of IT decision-makers have expressed concerns about the cybersecurity risks associated with LLMs, such as the potential for spreading misinformation. Security Concerns of LLMs While the potential applications of ...
1 year ago Feeds.dzone.com

Researchers automated jailbreaking of LLMs with other LLMs - AI security researchers from Robust Intelligence and Yale University have designed a machine learning technique that can speedily jailbreak large language models in an automated fashion. Their findings suggest that this vulnerability is universal ...
1 year ago Helpnetsecurity.com Hunters

The impact of prompt injection in LLM agents - This risk is particularly alarming when LLMs are turned into agents that interact directly with the external world, utilizing tools to fetch data or execute actions. Malicious actors can leverage prompt injection techniques to generate unintended and ...
1 year ago Helpnetsecurity.com

Researchers Show How to Use One LLM to Jailbreak Another - The exploding use of large language models in industry and across organizations has sparked a flurry of research activity focused on testing the susceptibility of LLMs to generate harmful and biased content when prompted in specific ways. The latest ...
1 year ago Darkreading.com

DeepSeek Generating Fully Working Keyloggers & Data Exfiltration Tools - Security researchers at Unit 42 have successfully prompted DeepSeek, a relatively new large language model (LLM), to generate detailed instructions for creating keyloggers, data exfiltration tools, and other harmful content. The research findings ...
3 months ago Cybersecuritynews.com

Criminal Use of AI Growing, But Lags Behind Defenders - In summary, Trend Micro has found only one criminal LLM: WormGPT. Instead, there is a growing incidence, and therefore potential use, of jailbreaking services: EscapeGPT, BlackHatGPT, and LoopGPT. There is also an increasing number of 'services' ...
1 year ago Securityweek.com

How to perform a proof of concept for automated discovery using Amazon Macie | AWS Security Blog - After reviewing the managed data identifiers provided by Macie and creating the custom data identifiers needed for your POC, it’s time to stage data sets that will help demonstrate the capabilities of these identifiers and better understand how ...
8 months ago Aws.amazon.com

How Does Automated API Testing Differ from Manual API Testing: Unveiling the Advantages - Delve into automated versus manual API testing for efficient software delivery. See how automation speeds validation while manual testing provides human insight, ensuring comprehensive coverage for robust development. In the domain of software ...
1 year ago Hackread.com

Integrating LLMs into security operations using Wazuh - Once YARA identifies a malicious file, ChatGPT enriches the alert with details about the detected threat, helping security teams better understand and respond to the incident. Log analysis and data enrichment: Trained LLMs like ChatGPT can interpret ...
4 months ago Bleepingcomputer.com

Why training LLMs with endpoint data will strengthen cybersecurity - Capturing weak signals across endpoints and predicting potential intrusion attempt patterns is a perfect challenge for Large Language Models to take on. The goal is to mine attack data to find new threat patterns and correlations while fine-tuning ...
1 year ago Venturebeat.com

OWASP Top 10 for LLM Applications: A Quick Guide - Even still, the expertise and insights provided, including prevention and mitigation techniques, are highly valuable to anyone building or interfacing with LLM applications. Prompt injections are maliciously crafted inputs that lead to an LLM ...
1 year ago Securityboulevard.com

AI models can be weaponized to hack websites on their own The Register - AI models, the subject of ongoing safety concerns about harmful and biased output, pose a risk beyond content emission. When wedded with tools that enable automated interaction with other systems, they can act on their own as malicious agents. ...
1 year ago Go.theregister.com

Researchers Jailbreaked 17 Popular LLM Models To Communicate Sensitive Data - Researchers demonstrated this by using a repeated token attack, where after generating the character “A” thousands of times, the model unexpectedly outputted content from a webpage that had been incorporated into its training data. ...
3 months ago Cybersecuritynews.com

LLMs Open to Manipulation Using Doctored Images, Audio - Such attacks could become a major issue as LLMs become increasingly multimodal or are capable of responding contextually to inputs that combine text, audio, pictures, and even video. Hiding Instructions in Images and Audio At Black Hat Europe 2023 ...
1 year ago Darkreading.com

Researchers Uncover Simple Technique to Extract ChatGPT Training Data - Can getting ChatGPT to repeat the same word over and over again cause it to regurgitate large amounts of its training data, including personally identifiable information and other data scraped from the Web? The answer is an emphatic yes, according to ...
1 year ago Darkreading.com

Google Pushes Software Security Via Rust, AI-Based Fuzzing - Google is making moves to help developers ensure that their code is secure. The IT giant this week said it is donating $1 million to the Rust Foundation to improve interoperability between the Rust programming language and legacy C++ codebase in ...
1 year ago Securityboulevard.com

Cybercriminals Hesitant About Using Generative AI - Cybercriminals are so far reluctant to use generative AI to launch attacks, according to new research by Sophos. Examining four prominent dark-web forums for discussions related to large language models, the firm found that threat actors showed ...
1 year ago Infosecurity-magazine.com

Akto Launches Proactive GenAI Security Testing Solution - With the increasing reliance on GenAI models and Language Learning Models like ChatGPT, the need for robust security measures have become paramount. Akto, a leading API Security company, is proud to announce the launch of its revolutionary GenAI ...
1 year ago Darkreading.com

Google Researchers' Attack Prompts ChatGPT to Reveal Its Training Data - A team of researchers primarily from Google's DeepMind systematically convinced ChatGPT to reveal snippets of the data it was trained on using a new type of attack prompt which asked a production model of the chatbot to repeat specific words forever. ...
1 year ago 404media.co

Cybercriminals are Showing Hesitation to Utilize AI Cyber Attacks - Media reports highlight the sale of LLMs like WormGPT and FraudGPT on underground forums. Fears mount over their potential for creating mutating malware, fueling a craze in the cybercriminal underground. Concerns arise over the dual-use nature of ...
1 year ago Cybersecuritynews.com

Researchers extract RSA keys from SSH server signing errors - A team of academic researchers from universities in California and Massachusetts demonstrated that it's possible under certain conditions for passive network attackers to retrieve secret RSA keys from naturally occurring errors leading to failed SSH ...
1 year ago Bleepingcomputer.com

Meta AI Models Cracked Open With Exposed API Tokens - Researchers recently were able to get full read and write access to Meta's Bloom, Meta-Llama, and Pythia large language model repositories in a troubling demonstration of the supply chain risks to organizations using these repositories to integrate ...
1 year ago Darkreading.com

Automating Threat Intelligence: Tools And Techniques For 2025 - Automated threat intelligence leverages artificial intelligence (AI), machine learning (ML), and orchestration platforms to collect, analyze, and act on vast amounts of threat data in real time. These platforms offer features like real-time threat ...
2 months ago Cybersecuritynews.com

4 key devsecops skills for the generative AI era - Experts believe that generative AI capabilities, copilots, and large language models are ushering in a new era of how developers, data scientists, and engineers will work and innovate. They expect AI to improve productivity, quality, and innovation, ...
1 year ago Infoworld.com