Researchers demonstrated this by using a repeated token attack, where after generating the character “A” thousands of times, the model unexpectedly outputted content from a webpage that had been incorporated into its training data. Security experts recommend implementing comprehensive content filtering, using multiple filter types, and applying maximum content filtering settings to mitigate these vulnerabilities. One particularly concerning finding revealed that while most tested applications showed strong resilience against training data and personally identifiable information (PII) leakage attempts, one application remained vulnerable to these attacks. Cyber Security News is a Dedicated News Platform For Cyber News, Cyber Attack News, Hacking News & Vulnerability Analysis. With years of experience under his belt in Cyber Security, he is covering Cyber Security News, technology and other news. Organizations should also implement security measures to monitor when and how employees are using LLMs, particularly unauthorized third-party applications. Tushar is a Cyber security content editor with a passion for creating captivating and informative content. A comprehensive study by Palo Alto Networks’ Unit 42 has revealed that 17 popular generative AI web applications remain vulnerable to various jailbreaking techniques. The research, effective as of November 10, 2024, tested both single-turn and multi-turn jailbreaking strategies across multiple attack categories. The Bad Likert Judge technique manipulates LLMs by having them evaluate the harmfulness of responses using a scale, then generating examples aligned with these ratings. These vulnerabilities potentially allow malicious actors to bypass AI safety mechanisms to extract sensitive information or generate harmful content. “We found that the majority of tested apps have employed LLMs with improved alignment against previously documented jailbreak strategies. However, as LLM alignment can still be bypassed relatively easily, we recommend comprehensive security practices,” the report stated. The team evaluated applications from the Andreessen Horowitz (a16z) Top 50 GenAI Web Products list, focusing on those with text generation and chatbot capabilities. This technique achieved slightly higher success rates compared to the Crescendo attack, with an overall ASR of 45.9% versus 43.2% for AI safety violation goals. For system prompt leakage, a simple instruction override technique proved most effective, with a 9.9% success rate.
This Cyber News was published on cybersecuritynews.com. Publication date: Mon, 10 Mar 2025 12:20:11 +0000