A New Trick Uses AI to Jailbreak AI Models-Including GPT-4

Large language models recently emerged as a powerful and transformative new kind of technology.
Their potential became headline news as ordinary people were dazzled by the capabilities of OpenAI's ChatGPT, released just a year ago.
In the months that followed the release of ChatGPT, discovering new jailbreaking methods became a popular pastime for mischievous users, as well as those interested in the security and reliability of AI systems.
Scores of startups are now building prototypes and fully fledged products on top of large language model APIs.
OpenAI said at its first-ever developer conference in November that over 2 million developers are now using its APIs.
These models simply predict the text that should follow a given input, but they are trained on vast quantities of text, from the web and other digital sources, using huge numbers of computer chips, over a period of many weeks or even months.
With enough data and training, language models exhibit savant-like prediction skills, responding to an extraordinary range of input with coherent and pertinent-seeming information.
The models also exhibit biases learned from their training data and tend to fabricate information when the answer to a prompt is less straightforward.
Without safeguards, they can offer advice to people on how to do things like obtain drugs or make bombs.
To keep the models in check, the companies behind them use the same method employed to make their responses more coherent and accurate-looking.
This involves having humans grade the model's answers and using that feedback to fine-tune the model so that it is less likely to misbehave.
Robust Intelligence provided WIRED with several example jailbreaks that sidestep such safeguards.
Not all of them worked on ChatGPT, the chatbot built on top of GPT-4, but several did, including one for generating phishing messages, and another for producing ideas to help a malicious actor remain hidden on a government computer network.
A similar method was developed by a research group led by Eric Wong, an assistant professor at the University of Pennsylvania.
The one from Robust Intelligence and his team involves additional refinements that let the system generate jailbreaks with half as many tries.
Brendan Dolan-Gavitt, an associate professor at New York University who studies computer security and machine learning, says the new technique revealed by Robust Intelligence shows that human fine-tuning is not a watertight way to secure models against attack.
Dolan-Gavitt says companies that are building systems on top of large language models like GPT-4 should employ additional safeguards.


This Cyber News was published on www.wired.com. Publication date: Tue, 05 Dec 2023 11:13:05 +0000


Cyber News related to A New Trick Uses AI to Jailbreak AI Models-Including GPT-4

GPT in Slack With React Integration - Understanding GPT. Before delving into the intricacies of GPT Slack React integration, let's grasp the fundamentals of GPT. Developed by OpenAI, GPT is a state-of-the-art language model that utilizes deep learning to generate human-like text based on ...
6 months ago Feeds.dzone.com
A New Trick Uses AI to Jailbreak AI Models-Including GPT-4 - Large language models recently emerged as a powerful and transformative new kind of technology. Their potential became headline news as ordinary people were dazzled by the capabilities of OpenAI's ChatGPT, released just a year ago. In the months that ...
7 months ago Wired.com
Malicious ChatGPT Agents May Steal Chat Messages and Data - In November 2023, OpenAI released GPTs publicly for everyone to create their customized version of GPT models. Several new customized GPTs were created for different purposes. On the other hand, threat actors can also utilize this public GPT model to ...
6 months ago Cybersecuritynews.com
AI models can be weaponized to hack websites on their own The Register - AI models, the subject of ongoing safety concerns about harmful and biased output, pose a risk beyond content emission. When wedded with tools that enable automated interaction with other systems, they can act on their own as malicious agents. ...
4 months ago Go.theregister.com
Securing AI: Navigating the Complex Landscape of Models, Fine-Tuning, and RAG - It underscores the urgent need for robust security measures and proper monitoring in developing, fine-tuning, and deploying AI models. The emergence of advanced models, like Generative Pre-trained Transformer 4, marks a new era in the AI landscape. ...
6 months ago Feedpress.me
ChatGPT Spills Secrets in Novel PoC Attack - A team of researchers from Google DeepMind, Open AI, ETH Zurich, McGill University, and the University of Washington have developed a new attack for extracting key architectural information from proprietary large language models such as ChatGPT and ...
3 months ago Darkreading.com
In the rush to build AI apps, don't leave security behind The Register - There are countless models, libraries, algorithms, pre-built tools, and packages to play with, and progress is relentless. You'll typically glue together libraries, packages, training data, models, and custom source code to perform inference tasks. ...
3 months ago Go.theregister.com
Addressing Deceptive AI: OpenAI Rival Anthropic Uncovers Difficulties in Correction - There is a possibility that artificial intelligence models can be trained to deceive. According to a new research led by Google-backed AI startup Anthropic, if a model exhibits deceptive behaviour, standard techniques cannot remove the deception and ...
5 months ago Cysecurity.news
In Other News: Fake Lockdown Mode, New Linux RAT, AI Jailbreak, Country's DNS Hijacked - Each week, we will curate and present a collection of noteworthy developments, ranging from the latest vulnerability discoveries and emerging attack techniques to significant policy changes and industry reports. Guilty pleas and convictions of ...
6 months ago Securityweek.com
Latest Information Security and Hacking Incidents - Recently, OpenAI and WHOOP collaborated to launch a GPT-4-powered, individualized health and fitness coach. A multitude of questions about health and fitness can be answered by WHOOP Coach. In addition to WHOOP, Summer Health, a text-based pediatric ...
5 months ago Cysecurity.news
CVE-2023-37274 - Auto-GPT is an experimental open-source application showcasing the capabilities of the GPT-4 language model. When Auto-GPT is executed directly on the host system via the provided run.sh or run.bat files, custom Python code execution is sandboxed ...
11 months ago
ML Model Repositories: The Next Big Supply Chain Attack Target - The techniques are similar to ones that attackers have successfully used for years to upload malware to open source code repositories, and highlight the need for organizations to implement controls for thoroughly inspecting ML models before use. ...
3 months ago Darkreading.com
OpenAI's New GPT Store May Carry Data Security Risks - A new kind of app store for ChatGPT may expose users to malicious bots, and legitimate ones that siphon their data to insecure, external locales. ChatGPT's fast rise in popularity, combined with the open source accessibility of the early GPT models, ...
5 months ago Darkreading.com
5 Unique Challenges for AI in Cybersecurity - Applied AI in cybersecurity has many unique challenges, and we will take a look into a few of them that we are considering the most important. On the other hand, supervised learning systems can remediate this issue and filter out anomalous by design ...
3 months ago Paloaltonetworks.com
ChatGPT 4 can exploit 87% of one-day vulnerabilities - Since the widespread and growing use of ChatGPT and other large language models in recent years, cybersecurity has been a top concern. ChatGPT 4 quickly exploited one-day vulnerabilities. During the study, the team used 15 one-day vulnerabilities ...
6 days ago Securityintelligence.com
Attackers Can Gain Control of Users' Queries and LLM Data Output - Gemini is Google's newest family of Large Language Models. The Gemini suite currently houses 3 different model sizes: Nano, Pro, and Ultra. Although Gemini has been removed from service due to politically biased content, findings from HiddenLayer ...
3 months ago Packetstormsecurity.com
Startups Scramble to Build Immediate AI Security - It also elevated startups working on machine learning security operations, AppSec remediation, and adding privacy to AI with fully homomorphic encryption. AI's largest attack surface involves its foundational models, such as Meta's Llama, or those ...
6 months ago Darkreading.com
Google Researchers' Attack Prompts ChatGPT to Reveal Its Training Data - A team of researchers primarily from Google's DeepMind systematically convinced ChatGPT to reveal snippets of the data it was trained on using a new type of attack prompt which asked a production model of the chatbot to repeat specific words forever. ...
7 months ago 404media.co
EU Reaches Agreement on AI Act Amid Three-Day Negotiations - The EU reached a provisional deal on the AI Act on December 8, 2023, following record-breaking 36-hour-long 'trilogue' negotiations between the EU Council, the EU Commission and the European Parliament. The landmark bill will regulate the use of AI ...
6 months ago Infosecurity-magazine.com
Researchers Show How to Use One LLM to Jailbreak Another - The exploding use of large language models in industry and across organizations has sparked a flurry of research activity focused on testing the susceptibility of LLMs to generate harmful and biased content when prompted in specific ways. The latest ...
7 months ago Darkreading.com
Zoom says AI features should come at no additional cost. Here's why - Zoom is pledging to provide artificial intelligence features at no additional cost to paid customers on its video-conferencing platform. Zoom also advocates the merits of a federated multi-model architecture, which it says will allow for better ...
6 months ago Zdnet.com
Enterprises will need AI governance as large language models grow in number - With the number of large language models in the market expected to grow and branch out, businesses will need a governance framework to manage their generative artificial intelligence applications. This approach will encompass the use of paid and ...
6 months ago Zdnet.com
Researchers automated jailbreaking of LLMs with other LLMs - AI security researchers from Robust Intelligence and Yale University have designed a machine learning technique that can speedily jailbreak large language models in an automated fashion. Their findings suggest that this vulnerability is universal ...
7 months ago Helpnetsecurity.com
The age of weaponized LLMs is here - It's exactly what one researcher, Julian Hazell, was able to simulate, adding to a collection of studies that, altogether, signify a seismic shift in cyber threats: the era of weaponized LLMs is here. The research all adds up to one thing: LLMs are ...
6 months ago Venturebeat.com
OpenAI study reveals surprising role of AI in future biological threat creation - OpenAI, the research organization behind the powerful language model GPT-4, has released a new study that examines the possibility of using AI to assist in creating biological threats. One such frontier risk is the ability for AI systems, such as ...
5 months ago Venturebeat.com

Latest Cyber News


Cyber Trends (last 7 days)


Trending Cyber News (last 7 days)