Researchers automated jailbreaking of LLMs with other LLMs

AI security researchers from Robust Intelligence and Yale University have designed a machine learning technique that can speedily jailbreak large language models in an automated fashion.
Their findings suggest that this vulnerability is universal across LLM technology, but they don't see an obvious fix for it.
There is a variety of attack tactics that can be used against LLM-based AI systems.
AI models can also be backdoored and their sensitive training data extracted - or poisoned.
The automated adversarial machine learning technique discovered by Robust Intelligence and Yale University researchers allows that last category of attacks by overriding the restrictions placed upon them.
This jailbreaking method is automated, can be leveraged against both open and closed-source models, and is optimized to be as stealthy as possible by minimizing the number of queries.
The researchers tested the technique against a number of LLM models, including GPT, GPT4-Turbo and PaLM-2, and discovered it finds jailbreaking prompts for more than 80% of requests for harmful information, while using fewer than 30 queries.
They've shared their research with the developers of the tested models before making it public.
As tech giants continue to vie for the leadership spot on the AI market by building new specialized large language models seemingly every few months, researchers - both independent and working for those same companies - have been probing them for security weaknesses.
Google has set up an AI-specific Red Team and expanded its bug bounty program to cover AI-related threats.
Microsoft has also invited bug hunters to probe its various integrations of the Copilot LLM. Earlier this year, the AI Village at hacker convention DEF CON hosted red teamers that were tasked with testing LLMs from Anthropic, Google, Hugging Face, NVIDIA, OpenAI, Stability, and Microsoft to uncover vulnerabilities that open LLMs to manipulation.


This Cyber News was published on www.helpnetsecurity.com. Publication date: Thu, 07 Dec 2023 11:13:04 +0000


Cyber News related to Researchers automated jailbreaking of LLMs with other LLMs

The age of weaponized LLMs is here - It's exactly what one researcher, Julian Hazell, was able to simulate, adding to a collection of studies that, altogether, signify a seismic shift in cyber threats: the era of weaponized LLMs is here. The research all adds up to one thing: LLMs are ...
1 year ago Venturebeat.com
Exploring the Security Risks of LLM - According to a recent survey, 74% of IT decision-makers have expressed concerns about the cybersecurity risks associated with LLMs, such as the potential for spreading misinformation. Security Concerns of LLMs While the potential applications of ...
11 months ago Feeds.dzone.com
Researchers automated jailbreaking of LLMs with other LLMs - AI security researchers from Robust Intelligence and Yale University have designed a machine learning technique that can speedily jailbreak large language models in an automated fashion. Their findings suggest that this vulnerability is universal ...
1 year ago Helpnetsecurity.com
The impact of prompt injection in LLM agents - This risk is particularly alarming when LLMs are turned into agents that interact directly with the external world, utilizing tools to fetch data or execute actions. Malicious actors can leverage prompt injection techniques to generate unintended and ...
1 year ago Helpnetsecurity.com
Researchers Show How to Use One LLM to Jailbreak Another - The exploding use of large language models in industry and across organizations has sparked a flurry of research activity focused on testing the susceptibility of LLMs to generate harmful and biased content when prompted in specific ways. The latest ...
1 year ago Darkreading.com
Criminal Use of AI Growing, But Lags Behind Defenders - In summary, Trend Micro has found only one criminal LLM: WormGPT. Instead, there is a growing incidence, and therefore potential use, of jailbreaking services: EscapeGPT, BlackHatGPT, and LoopGPT. There is also an increasing number of 'services' ...
7 months ago Securityweek.com
How to perform a proof of concept for automated discovery using Amazon Macie | AWS Security Blog - After reviewing the managed data identifiers provided by Macie and creating the custom data identifiers needed for your POC, it’s time to stage data sets that will help demonstrate the capabilities of these identifiers and better understand how ...
2 months ago Aws.amazon.com
How Does Automated API Testing Differ from Manual API Testing: Unveiling the Advantages - Delve into automated versus manual API testing for efficient software delivery. See how automation speeds validation while manual testing provides human insight, ensuring comprehensive coverage for robust development. In the domain of software ...
10 months ago Hackread.com
Why training LLMs with endpoint data will strengthen cybersecurity - Capturing weak signals across endpoints and predicting potential intrusion attempt patterns is a perfect challenge for Large Language Models to take on. The goal is to mine attack data to find new threat patterns and correlations while fine-tuning ...
11 months ago Venturebeat.com
OWASP Top 10 for LLM Applications: A Quick Guide - Even still, the expertise and insights provided, including prevention and mitigation techniques, are highly valuable to anyone building or interfacing with LLM applications. Prompt injections are maliciously crafted inputs that lead to an LLM ...
8 months ago Securityboulevard.com
AI models can be weaponized to hack websites on their own The Register - AI models, the subject of ongoing safety concerns about harmful and biased output, pose a risk beyond content emission. When wedded with tools that enable automated interaction with other systems, they can act on their own as malicious agents. ...
10 months ago Go.theregister.com
LLMs Open to Manipulation Using Doctored Images, Audio - Such attacks could become a major issue as LLMs become increasingly multimodal or are capable of responding contextually to inputs that combine text, audio, pictures, and even video. Hiding Instructions in Images and Audio At Black Hat Europe 2023 ...
1 year ago Darkreading.com
Researchers Uncover Simple Technique to Extract ChatGPT Training Data - Can getting ChatGPT to repeat the same word over and over again cause it to regurgitate large amounts of its training data, including personally identifiable information and other data scraped from the Web? The answer is an emphatic yes, according to ...
1 year ago Darkreading.com
Google Researchers' Attack Prompts ChatGPT to Reveal Its Training Data - A team of researchers primarily from Google's DeepMind systematically convinced ChatGPT to reveal snippets of the data it was trained on using a new type of attack prompt which asked a production model of the chatbot to repeat specific words forever. ...
1 year ago 404media.co
Google Pushes Software Security Via Rust, AI-Based Fuzzing - Google is making moves to help developers ensure that their code is secure. The IT giant this week said it is donating $1 million to the Rust Foundation to improve interoperability between the Rust programming language and legacy C++ codebase in ...
10 months ago Securityboulevard.com
Researchers extract RSA keys from SSH server signing errors - A team of academic researchers from universities in California and Massachusetts demonstrated that it's possible under certain conditions for passive network attackers to retrieve secret RSA keys from naturally occurring errors leading to failed SSH ...
1 year ago Bleepingcomputer.com
Cybercriminals Hesitant About Using Generative AI - Cybercriminals are so far reluctant to use generative AI to launch attacks, according to new research by Sophos. Examining four prominent dark-web forums for discussions related to large language models, the firm found that threat actors showed ...
1 year ago Infosecurity-magazine.com
Akto Launches Proactive GenAI Security Testing Solution - With the increasing reliance on GenAI models and Language Learning Models like ChatGPT, the need for robust security measures have become paramount. Akto, a leading API Security company, is proud to announce the launch of its revolutionary GenAI ...
10 months ago Darkreading.com
Cybercriminals are Showing Hesitation to Utilize AI Cyber Attacks - Media reports highlight the sale of LLMs like WormGPT and FraudGPT on underground forums. Fears mount over their potential for creating mutating malware, fueling a craze in the cybercriminal underground. Concerns arise over the dual-use nature of ...
1 year ago Cybersecuritynews.com
Meta AI Models Cracked Open With Exposed API Tokens - Researchers recently were able to get full read and write access to Meta's Bloom, Meta-Llama, and Pythia large language model repositories in a troubling demonstration of the supply chain risks to organizations using these repositories to integrate ...
1 year ago Darkreading.com
Exposed Hugging Face API tokens jeopardized GenAI models - Lasso Security researchers discovered 1,681 Hugging Face API tokens exposed in code repositories, which left vendors such as Google, Meta, Microsoft and VMware open to potential supply chain attacks. In a blog post published Monday, Lasso Security ...
1 year ago Techtarget.com
4 key devsecops skills for the generative AI era - Experts believe that generative AI capabilities, copilots, and large language models are ushering in a new era of how developers, data scientists, and engineers will work and innovate. They expect AI to improve productivity, quality, and innovation, ...
11 months ago Infoworld.com
Week in review: Booking.com hotel booking scam, Kali Linux 2023.4 released - Advanced ransomware campaigns expose need for AI-powered cyber defenseIn this Help Net Security interview, Carl Froggett, CIO at Deep Instinct, discusses emerging trends in ransomware attacks, emphasizing the need for businesses to use advanced AI ...
1 year ago Helpnetsecurity.com
Novel LLMjacking Attacks Target Cloud-Based AI Models - Enterprise organizations aren't alone in embracing generative AI. Cybercriminals doing so, too. They're using GenAI to shape their attacks, such as creating more convincing phishing emails, spreading disinformation to model poisoning, and creating ...
7 months ago Securityboulevard.com
DARPA awards $1 million to Trail of Bits for AI Cyber Challenge - We're excited to share that Trail of Bits has been selected as one of the seven exclusive teams to participate in the small business track for DARPA's AI Cyber Challenge. Our team will receive a $1 million award to create a Cyber Reasoning System and ...
9 months ago Securityboulevard.com

Latest Cyber News


Cyber Trends (last 7 days)


Trending Cyber News (last 7 days)