Researchers Jailbreaked 17 Popular LLM Models To Communicate Sensitive Data

Researchers demonstrated this by using a repeated token attack, where after generating the character “A” thousands of times, the model unexpectedly outputted content from a webpage that had been incorporated into its training data. Security experts recommend implementing comprehensive content filtering, using multiple filter types, and applying maximum content filtering settings to mitigate these vulnerabilities. One particularly concerning finding revealed that while most tested applications showed strong resilience against training data and personally identifiable information (PII) leakage attempts, one application remained vulnerable to these attacks. Cyber Security News is a Dedicated News Platform For Cyber News, Cyber Attack News, Hacking News & Vulnerability Analysis. With years of experience under his belt in Cyber Security, he is covering Cyber Security News, technology and other news. Organizations should also implement security measures to monitor when and how employees are using LLMs, particularly unauthorized third-party applications. Tushar is a Cyber security content editor with a passion for creating captivating and informative content. A comprehensive study by Palo Alto Networks’ Unit 42 has revealed that 17 popular generative AI web applications remain vulnerable to various jailbreaking techniques. The research, effective as of November 10, 2024, tested both single-turn and multi-turn jailbreaking strategies across multiple attack categories. The Bad Likert Judge technique manipulates LLMs by having them evaluate the harmfulness of responses using a scale, then generating examples aligned with these ratings. These vulnerabilities potentially allow malicious actors to bypass AI safety mechanisms to extract sensitive information or generate harmful content. “We found that the majority of tested apps have employed LLMs with improved alignment against previously documented jailbreak strategies. However, as LLM alignment can still be bypassed relatively easily, we recommend comprehensive security practices,” the report stated. The team evaluated applications from the Andreessen Horowitz (a16z) Top 50 GenAI Web Products list, focusing on those with text generation and chatbot capabilities. This technique achieved slightly higher success rates compared to the Crescendo attack, with an overall ASR of 45.9% versus 43.2% for AI safety violation goals. For system prompt leakage, a simple instruction override technique proved most effective, with a 9.9% success rate.

This Cyber News was published on cybersecuritynews.com. Publication date: Mon, 10 Mar 2025 12:20:11 +0000

Cyber News related to Researchers Jailbreaked 17 Popular LLM Models To Communicate Sensitive Data

How to perform a proof of concept for automated discovery using Amazon Macie | AWS Security Blog - After reviewing the managed data identifiers provided by Macie and creating the custom data identifiers needed for your POC, it’s time to stage data sets that will help demonstrate the capabilities of these identifiers and better understand how ...
1 year ago Aws.amazon.com

OWASP Top 10 for LLM Applications: A Quick Guide - Even still, the expertise and insights provided, including prevention and mitigation techniques, are highly valuable to anyone building or interfacing with LLM applications. Prompt injections are maliciously crafted inputs that lead to an LLM ...
1 year ago Securityboulevard.com

Researchers Show How to Use One LLM to Jailbreak Another - The exploding use of large language models in industry and across organizations has sparked a flurry of research activity focused on testing the susceptibility of LLMs to generate harmful and biased content when prompted in specific ways. The latest ...
1 year ago Darkreading.com

Researchers Uncover Simple Technique to Extract ChatGPT Training Data - Can getting ChatGPT to repeat the same word over and over again cause it to regurgitate large amounts of its training data, including personally identifiable information and other data scraped from the Web? The answer is an emphatic yes, according to ...
1 year ago Darkreading.com

Meta AI Models Cracked Open With Exposed API Tokens - Researchers recently were able to get full read and write access to Meta's Bloom, Meta-Llama, and Pythia large language model repositories in a troubling demonstration of the supply chain risks to organizations using these repositories to integrate ...
1 year ago Darkreading.com

AI models can be weaponized to hack websites on their own The Register - AI models, the subject of ongoing safety concerns about harmful and biased output, pose a risk beyond content emission. When wedded with tools that enable automated interaction with other systems, they can act on their own as malicious agents. ...
1 year ago Go.theregister.com

Hugging Face dodged a cyber-bullet with Lasso Security's help - Further validating how brittle the security of generative AI models and their platforms are, Lasso Security helped Hugging Face dodge a potentially devastating attack by discovering that 1,681 API tokens were at risk of being compromised. The tokens ...
1 year ago Venturebeat.com

The impact of prompt injection in LLM agents - This risk is particularly alarming when LLMs are turned into agents that interact directly with the external world, utilizing tools to fetch data or execute actions. Malicious actors can leverage prompt injection techniques to generate unintended and ...
1 year ago Helpnetsecurity.com

Securing AI: Navigating the Complex Landscape of Models, Fine-Tuning, and RAG - It underscores the urgent need for robust security measures and proper monitoring in developing, fine-tuning, and deploying AI models. The emergence of advanced models, like Generative Pre-trained Transformer 4, marks a new era in the AI landscape. ...
1 year ago Feedpress.me

Researchers automated jailbreaking of LLMs with other LLMs - AI security researchers from Robust Intelligence and Yale University have designed a machine learning technique that can speedily jailbreak large language models in an automated fashion. Their findings suggest that this vulnerability is universal ...
1 year ago Helpnetsecurity.com Hunters

Google Researchers' Attack Prompts ChatGPT to Reveal Its Training Data - A team of researchers primarily from Google's DeepMind systematically convinced ChatGPT to reveal snippets of the data it was trained on using a new type of attack prompt which asked a production model of the chatbot to repeat specific words forever. ...
1 year ago 404media.co

In the rush to build AI apps, don't leave security behind The Register - There are countless models, libraries, algorithms, pre-built tools, and packages to play with, and progress is relentless. You'll typically glue together libraries, packages, training data, models, and custom source code to perform inference tasks. ...
1 year ago Go.theregister.com Hunters

How machine learning helps us hunt threats | Securelist - In this post, we will share our experience hunting for new threats by processing Kaspersky Security Network (KSN) global threat data with ML tools to identify subtle new Indicators of Compromise (IoCs). The model can process and learn from millions ...
1 year ago Securelist.com

Palo Alto Networks Prevents Data Loss at Enterprise Scale with NVIDIA - With NVIDIA accelerated computing and AI software, cybersecurity leaders like Palo Alto Networks can safeguard vast amounts of sensitive information with unprecedented speed and accuracy, ushering in a new era of AI-driven data protection. The ...
1 year ago Paloaltonetworks.com

Top LLM vulnerabilities and how to mitigate the associated risk - As large language models become more prevalent, a comprehensive understanding of the LLM threat landscape remains elusive. While the AI threat landscape changes every day, there are a handful of LLM vulnerabilities that we know pose significant risk ...
1 year ago Helpnetsecurity.com

ChatGPT and Beyond: Generative AI in Security - The impact of generative AI, particularly models like ChatGPT, has captured the imagination of many in the security industry. Generative AIs encompass a variety of techniques such as large language models, generative adversarial networks, diffusion ...
1 year ago Securityboulevard.com

Exposed Hugging Face API tokens jeopardized GenAI models - Lasso Security researchers discovered 1,681 Hugging Face API tokens exposed in code repositories, which left vendors such as Google, Meta, Microsoft and VMware open to potential supply chain attacks. In a blog post published Monday, Lasso Security ...
1 year ago Techtarget.com

ChatGPT Spills Secrets in Novel PoC Attack - A team of researchers from Google DeepMind, Open AI, ETH Zurich, McGill University, and the University of Washington have developed a new attack for extracting key architectural information from proprietary large language models such as ChatGPT and ...
1 year ago Darkreading.com

Flawed AI Tools Create Worries for Private LLMs, Chatbots - Companies that use private instances of large language models to make their business data searchable through a conversational interface face risks of data poisoning and potential data leakage if they do not properly implement security controls to ...
1 year ago Darkreading.com

Researchers Jailbreaked 17 Popular LLM Models To Communicate Sensitive Data - Researchers demonstrated this by using a repeated token attack, where after generating the character “A” thousands of times, the model unexpectedly outputted content from a webpage that had been incorporated into its training data. ...
6 months ago Cybersecuritynews.com

Addressing Deceptive AI: OpenAI Rival Anthropic Uncovers Difficulties in Correction - There is a possibility that artificial intelligence models can be trained to deceive. According to a new research led by Google-backed AI startup Anthropic, if a model exhibits deceptive behaviour, standard techniques cannot remove the deception and ...
1 year ago Cysecurity.news

Akto Launches Proactive GenAI Security Testing Solution - With the increasing reliance on GenAI models and Language Learning Models like ChatGPT, the need for robust security measures have become paramount. Akto, a leading API Security company, is proud to announce the launch of its revolutionary GenAI ...
1 year ago Darkreading.com

ML Model Repositories: The Next Big Supply Chain Attack Target - The techniques are similar to ones that attackers have successfully used for years to upload malware to open source code repositories, and highlight the need for organizations to implement controls for thoroughly inspecting ML models before use. ...
1 year ago Darkreading.com

Three Tips To Use AI Securely at Work - Simon makes a very good point that AI is becoming similar to open source software in a way. To remain nimble and leverage the work of great minds from around the world, companies will need to adopt it or spend a lot of time and money trying to ...
1 year ago Securityboulevard.com

Exploring the Security Risks of LLM - According to a recent survey, 74% of IT decision-makers have expressed concerns about the cybersecurity risks associated with LLMs, such as the potential for spreading misinformation. Security Concerns of LLMs While the potential applications of ...
1 year ago Feeds.dzone.com