Security researchers at Unit 42 have successfully prompted DeepSeek, a relatively new large language model (LLM), to generate detailed instructions for creating keyloggers, data exfiltration tools, and other harmful content. The research findings demonstrate a significant security concern: while information on creating malicious tools is available online, LLMs with insufficient safety restrictions dramatically lower the barrier to entry for potential attackers by providing easily usable, actionable guidance1. Unit 42 researchers employed three sophisticated jailbreaking techniques, Bad Likert Judge, Crescendo, and Deceptive Delight, to test DeepSeek’s vulnerability to manipulation. When using the Bad Likert Judge technique, researchers successfully prompted DeepSeek to generate keylogger code, detailed phishing email templates, and sophisticated social engineering strategies. The researchers employed three advanced jailbreaking techniques to bypass the model’s safety guardrails, raising significant concerns about the potential misuse of emerging AI technologies. With careful manipulation, researchers were able to extract detailed code for creating data exfiltration tools, including functional keylogger scripts written in Python. Starting with seemingly innocuous historical questions about topics like Molotov cocktails, researchers were able to extract comprehensive step-by-step instructions for creating dangerous devices in just a few interactions. The researchers note that while complete protection against all jailbreaking techniques remains challenging, proper security protocols can significantly mitigate risks1. “While DeepSeek’s initial responses to our prompts were not overtly malicious, they hinted at a potential for additional output,” the researchers noted in their findings.
This Cyber News was published on cybersecuritynews.com. Publication date: Thu, 13 Mar 2025 16:00:10 +0000