The techniques are similar to ones that attackers have successfully used for years to upload malware to open source code repositories, and highlight the need for organizations to implement controls for thoroughly inspecting ML models before use.
Repositories such as Hugging Face are an attractive target because ML models give threat actors access to sensitive information and environments.
They are also relatively new, says Mary Walker, a security engineer at Dropbox and co-author of the Black Hat Asia paper.
Machine Learning Pipelines, An Emerging Target Hugging Face is a repository for ML tools, data sets, and models that developers can download and integrate into their own projects.
Like many public code repositories, it allows developers to create and upload their own ML models, or look for models that match their requirements.
Hugging Face's security controls include scanning for malware, vulnerabilities, secrets, and sensitive information across the repository.
It also offers a format called Safetensors, that allows developers to more securely store and upload large tensors - or the core data structures in machine learning models.
The repository - and other ML model repositories - give openings for attackers to upload malicious models with a view to getting developers to download and use them in their projects.
Wood for instance found that it was trivial for an attacker to register a namespace within the service that appeared to belong to a brand-name organization.
There is little to then prevent an attacker from using that namespace to trick actual users from that organization to start uploading ML models to it - which the attacker could poison at will.
Wood says that when he registered a namespace that appeared to belong to a well-known brand, he did not even have to try to get users from the organization to upload models.
Instead, software engineers and ML engineers from the organizations contacted him directly with requests to join the namespace so they could upload ML models to it, which then Wood could have backdoored at will.
Another example is a model confusion attack when a threat actor might discover the name of private dependencies within a project, and then create public malicious dependencies with the exact names.
In the past, such confusion attacks on open source repositories such as npm and PyPI have resulted in internal projects defaulting to the malicious dependencies with the same name.
Malware on ML Repositories Threat actors have already begun eyeing ML repositories as potential supply chain attack vector.
Only earlier this year for instance, researchers at JFrog discovered a malicious ML model on Hugging Face that, upon loading, executed malicious code that gave attackers full control of the victim machine.
Wood's demonstration involves injecting malware into models using the Keras library and Tensorflow as the backend engine.
Wood found that Keras models offer attackers a way to execute arbitrary code in the background while having the model perform in exactly the manner intended.
In 2020, researchers from HiddenLayer used something similar to steganography to embed a ransomware executable into a model, and then loaded it using pickle.
This Cyber News was published on www.darkreading.com. Publication date: Mon, 18 Mar 2024 22:10:13 +0000