A New Trick Uses AI to Jailbreak AI Models-Including GPT-4

Large language models recently emerged as a powerful and transformative new kind of technology.
Their potential became headline news as ordinary people were dazzled by the capabilities of OpenAI's ChatGPT, released just a year ago.
In the months that followed the release of ChatGPT, discovering new jailbreaking methods became a popular pastime for mischievous users, as well as those interested in the security and reliability of AI systems.
Scores of startups are now building prototypes and fully fledged products on top of large language model APIs.
OpenAI said at its first-ever developer conference in November that over 2 million developers are now using its APIs.
These models simply predict the text that should follow a given input, but they are trained on vast quantities of text, from the web and other digital sources, using huge numbers of computer chips, over a period of many weeks or even months.
With enough data and training, language models exhibit savant-like prediction skills, responding to an extraordinary range of input with coherent and pertinent-seeming information.
The models also exhibit biases learned from their training data and tend to fabricate information when the answer to a prompt is less straightforward.
Without safeguards, they can offer advice to people on how to do things like obtain drugs or make bombs.
To keep the models in check, the companies behind them use the same method employed to make their responses more coherent and accurate-looking.
This involves having humans grade the model's answers and using that feedback to fine-tune the model so that it is less likely to misbehave.
Robust Intelligence provided WIRED with several example jailbreaks that sidestep such safeguards.
Not all of them worked on ChatGPT, the chatbot built on top of GPT-4, but several did, including one for generating phishing messages, and another for producing ideas to help a malicious actor remain hidden on a government computer network.
A similar method was developed by a research group led by Eric Wong, an assistant professor at the University of Pennsylvania.
The one from Robust Intelligence and his team involves additional refinements that let the system generate jailbreaks with half as many tries.
Brendan Dolan-Gavitt, an associate professor at New York University who studies computer security and machine learning, says the new technique revealed by Robust Intelligence shows that human fine-tuning is not a watertight way to secure models against attack.
Dolan-Gavitt says companies that are building systems on top of large language models like GPT-4 should employ additional safeguards.

This Cyber News was published on www.wired.com. Publication date: Tue, 05 Dec 2023 11:13:05 +0000

Cyber News related to A New Trick Uses AI to Jailbreak AI Models-Including GPT-4

GPT in Slack With React Integration - Understanding GPT. Before delving into the intricacies of GPT Slack React integration, let's grasp the fundamentals of GPT. Developed by OpenAI, GPT is a state-of-the-art language model that utilizes deep learning to generate human-like text based on ...
1 year ago Feeds.dzone.com

STRIDE GPT - AI-powered Tool LLMs To Generate Threat Models - STRIDE GPT, an AI-powered threat modeling tool, leverages the capabilities of large language models (LLMs) to generate comprehensive threat models and attack trees for applications, ensuring a proactive approach to security. In conclusion, STRIDE GPT ...
5 months ago Cybersecuritynews.com Inception

OpenAI prepares GPT-5 for roll out - "GPT-5 is our next foundational model that is meant to just make everything our models can currently do better and with less model switching," Jerry Tworek, who is a VP at OpenAI, wrote in a Reddit post. My sources tell me that GPT-5 could ...
2 months ago Bleepingcomputer.com

Leak confirms OpenAI's GPT 4.1 is coming before GPT 5.0 - As spotted by AI researcher Tibor Blaho, OpenAI is already testing model art for o3, o4-mini, and GPT-4.1 (including nano and mini variants) on the OpenAI API platform. Also, GPT-5 isn't happening anytime soon, as OpenAI plans to focus on o3, ...
5 months ago Bleepingcomputer.com

A New Trick Uses AI to Jailbreak AI Models-Including GPT-4 - Large language models recently emerged as a powerful and transformative new kind of technology. Their potential became headline news as ordinary people were dazzled by the capabilities of OpenAI's ChatGPT, released just a year ago. In the months that ...
1 year ago Wired.com

Malicious ChatGPT Agents May Steal Chat Messages and Data - In November 2023, OpenAI released GPTs publicly for everyone to create their customized version of GPT models. Several new customized GPTs were created for different purposes. On the other hand, threat actors can also utilize this public GPT model to ...
1 year ago Cybersecuritynews.com

ChatGPT 4.1 fails to beat Google Gemini 2.5 in early benchmarks - According to benchmarks shared by Stagehand, which is a production-ready browser automation framework, Gemini 2.0 Flash has the lowest error rate (6.67%) along with the highest exact‑match score (90%), and it’s also cheap and fast. ...
5 months ago Bleepingcomputer.com

ChatGPT 4.1 early benchmarks compared against Google Gemini - For example, GPT‑4.1 scores 54.6% on SWE-bench Verified, which is better than GPT-4o by 21.4% and 26.6% over GPT‑4.5. We have similar results on other benchmarking tools shared by OpenAI, but how does it compete against Gemini ...
5 months ago Bleepingcomputer.com

Securing AI: Navigating the Complex Landscape of Models, Fine-Tuning, and RAG - It underscores the urgent need for robust security measures and proper monitoring in developing, fine-tuning, and deploying AI models. The emergence of advanced models, like Generative Pre-trained Transformer 4, marks a new era in the AI landscape. ...
1 year ago Feedpress.me

AI models can be weaponized to hack websites on their own The Register - AI models, the subject of ongoing safety concerns about harmful and biased output, pose a risk beyond content emission. When wedded with tools that enable automated interaction with other systems, they can act on their own as malicious agents. ...
1 year ago Go.theregister.com

OpenAI says GPT-5 will unify breakthroughs from different models - OpenAI has again confirmed that it will unify multiple models into one and create GPT-5, which is expected to ship sometime in the summer. "GPT-5 is our next foundational model that is meant to just make everything our models can currently do better ...
3 months ago Bleepingcomputer.com

OpenAI's GPT 4.5 spotted in Android beta, launch imminent - As a result, OpenAI CEO Sam Altman recently announced that ChatGPT will simplify its model names and release versions like GPT-4.5, GPT-5, and so on. Beyond the references to GPT-4.5, AI researcher Tibor Blaho has spotted a few additional experiments ...
7 months ago Bleepingcomputer.com

In the rush to build AI apps, don't leave security behind The Register - There are countless models, libraries, algorithms, pre-built tools, and packages to play with, and progress is relentless. You'll typically glue together libraries, packages, training data, models, and custom source code to perform inference tasks. ...
1 year ago Go.theregister.com Hunters

ChatGPT Spills Secrets in Novel PoC Attack - A team of researchers from Google DeepMind, Open AI, ETH Zurich, McGill University, and the University of Washington have developed a new attack for extracting key architectural information from proprietary large language models such as ChatGPT and ...
1 year ago Darkreading.com

ChatGPT"s GPT-5-reasoning-alpha model spotted ahead of launch - GPT-5 might be just a few days or weeks away, as we've spotted references to a new model called gpt-5-reasoning-alpha-2025-07-13. Other researchers have also dropped hints that GPT-5 will combine breakthroughs from all models to create a unified ...
2 months ago Bleepingcomputer.com

Addressing Deceptive AI: OpenAI Rival Anthropic Uncovers Difficulties in Correction - There is a possibility that artificial intelligence models can be trained to deceive. According to a new research led by Google-backed AI startup Anthropic, if a model exhibits deceptive behaviour, standard techniques cannot remove the deception and ...
1 year ago Cysecurity.news

How machine learning helps us hunt threats | Securelist - In this post, we will share our experience hunting for new threats by processing Kaspersky Security Network (KSN) global threat data with ML tools to identify subtle new Indicators of Compromise (IoCs). The model can process and learn from millions ...
1 year ago Securelist.com

OpenAI is routing GPT-4o to safety models when it detects harmful activities - OpenAI has implemented a new safety mechanism for its GPT-4o model to enhance user protection against harmful activities. When GPT-4o detects potentially dangerous or malicious content, it routes the interaction to specialized safety models designed ...
1 week ago Bleepingcomputer.com

OpenAI is testing a new GPT-5-based AI agent, GPT-Alpha - OpenAI is currently testing an advanced AI agent named GPT-Alpha, which is based on the upcoming GPT-5 architecture. This new AI agent represents a significant leap in artificial intelligence capabilities, promising enhanced performance and more ...
2 weeks ago Bleepingcomputer.com

OpenAI rolls out GPT-Codex Alpha with early access to new models - OpenAI has launched GPT-Codex Alpha, providing early access to its latest AI models designed to enhance coding and software development. This new release aims to empower developers by offering advanced AI capabilities that can understand and generate ...
4 days ago Bleepingcomputer.com

In Other News: Fake Lockdown Mode, New Linux RAT, AI Jailbreak, Country's DNS Hijacked - Each week, we will curate and present a collection of noteworthy developments, ranging from the latest vulnerability discoveries and emerging attack techniques to significant policy changes and industry reports. Guilty pleas and convictions of ...
1 year ago Securityweek.com CVE-2022-28958

OpenAI says GPT-6 is coming, and it'll be better than GPT-5, obviously - OpenAI has announced the upcoming release of GPT-6, promising significant improvements over its predecessor, GPT-5. This new iteration of the AI language model is expected to enhance natural language understanding, generation capabilities, and ...
1 month ago Bleepingcomputer.com

ML Model Repositories: The Next Big Supply Chain Attack Target - The techniques are similar to ones that attackers have successfully used for years to upload malware to open source code repositories, and highlight the need for organizations to implement controls for thoroughly inspecting ML models before use. ...
1 year ago Darkreading.com

Latest Information Security and Hacking Incidents - Recently, OpenAI and WHOOP collaborated to launch a GPT-4-powered, individualized health and fitness coach. A multitude of questions about health and fitness can be answered by WHOOP Coach. In addition to WHOOP, Summer Health, a text-based pediatric ...
1 year ago Cysecurity.news

A New Trick Uses AI to Jailbreak AI Models-Including GPT-4

Cyber News related to A New Trick Uses AI to Jailbreak AI Models-Including GPT-4

Latest Cyber News

Cyber Trends (last 7 days)

Trending Cyber News (last 7 days)