Companies pursing internal AI development using models from Hugging Face and other open source repositories need to focus on supply chain security and checking for vulnerabilities. While the attacks appeared to be proofs-of-concept, their success in being hosted with a "No issue" tag shows that companies should not rely on Hugging Face's and other repositories' safety checks for their own security, says Tomislav Pericin, chief software architect at ReversingLabs. Attackers are finding more and more ways to post malicious projects to Hugging Face and other repositories for open source artificial intelligence (AI) models, while dodging the sites' security checks. Licensing is another issue: While pretrained AI models are frequently called "open source AI," they generally do not provide all the information needed to reproduce the AI model, such as code and training data. While Hugging Face has explicit checks for Pickle files, the malicious code discovered by ReversingLabs sidestepped those checks by using a different file compression for the data. Other research by application security firm Checkmarx found multiple ways to bypass the scanners, such as PickleScan used by Hugging Face, to detect dangerous Pickle files. "You kind of need to manage AI models like you would any other open source dependencies," Stiefel says. Companies are quickly adopting AI, and the majority are also establishing internal projects using open source AI models from repositories — such as Hugging Face, TensorFlow Hub, and PyTorch Hub. Despite vocal warnings from security researchers, the Pickle format continues to be used by many data scientists, says Tom Bonner, vice president of research at HiddenLayer, an AI-focused detection and response firm. Rather than Pickle files, data science and AI teams should move to Safetensors — a library for a new data format managed by Hugging Face, EleutherAI, and Stability AI — which has been audited for security. The threat actor used a common vector — data files using the Pickle format — with a new technique, dubbed "NullifAI," to evade detection. "PickleScan uses a blocklist which was successfully bypassed using both built-in Python dependencies," Dor Tumarkin, director of application security research at Checkmarx, stated in the analysis. In addition, they should pay attention to common signals of software safety, including the source of the model, development activity around the model, its popularity, and the operational and security risks, Endor's Stiefel says. Hugging Face's automated checks, for example, recently failed to detect malicious code in two AI models hosted on the repository, according to a Feb. "There is already research about what kind of prompts would trigger the model to behave in an unpredictable way, divulge confidential information, or teach things that could be harmful," he says. The escalating problem underscores the need for companies pursuing internal AI projects to have robust mechanisms to detect security flaws and malicious code within their supply chains. Overall, 61% of companies are using models from the open source ecosystem to create their own AI tools, according to a Morning Consult survey of 2,400 IT decision-makers sponsored by IBM. Despite having malicious features, this model passes security checks on Hugging Face. Yet many of the components can contain executable code, leading to a variety of security risks, such as code execution, backdoors, prompt injections, and alignment issues — the latter being how well an AI model matches the intent of the developers and users.
This Cyber News was published on www.darkreading.com. Publication date: Fri, 14 Feb 2025 15:05:03 +0000