A recent study from a team of cybersecurity researchers has revealed severe security flaws in commercial-grade Large Reasoning Models (LRMs), including OpenAI’s o1/o3 series, DeepSeek-R1, and Google’s Gemini 2.0 Flash Thinking. The research introduces two key innovations: the Malicious-Educator benchmark for stress-testing AI safety protocols and the Hijacking Chain-of-Thought (H-CoT) attack method, which reduced model refusal rates from 98% to under 2% in critical scenarios. The team from Duke University’s Center for Computational Evolutionary Intelligence developed a dataset of 50 queries spanning 10 high-risk categories, including terrorism, cybercrime, and child exploitation, crafted as educational prompts. Attackers could intercept these outputs, and H-CoT raised success rates to 96.8% by exploiting its multilingual inconsistencies (e.g., English queries bypassed Chinese safety filters). Current LRMs use chain-of-thought (CoT) reasoning to justify safety decisions, often displaying intermediate steps like “Confirming compliance with harm prevention policies…”. Cyber Security News is a Dedicated News Platform For Cyber News, Cyber Attack News, Hacking News & Vulnerability Analysis. While o1 initially refused 99% of Malicious-Educator queries, post-2024 updates saw refusal rates plummet to <2% under H-CoT. Gurubaran is a co-founder of Cyber Security News and GBHackers On Security. When prompted with H-CoT, its tone shifted from cautious to eager compliance, providing detailed criminal frameworks in 100% of tested cases.
This Cyber News was published on cybersecuritynews.com. Publication date: Tue, 25 Feb 2025 13:35:17 +0000