Researchers recently were able to get full read and write access to Meta's Bloom, Meta-Llama, and Pythia large language model repositories in a troubling demonstration of the supply chain risks to organizations using these repositories to integrate LLM capabilities into their applications and operations.
The access would have allowed an adversary to silently poison training data in these widely used LLMs, steal models and data sets, and potentially execute other malicious activities that would heighten security risks for millions of downstream users.
Exposed Tokens on Hugging Face That's according to researchers at AI security startup Lasso who were able to access the Meta-owned model repositories using unsecured API access tokens they discovered on GitHub and the Hugging Face platform for LLM developers.
The tokens they discovered for the Meta platforms were among over 1,500 similar tokens they found on Hugging Face and GitHub that provided them with varying degrees of access to repositories belonging to a total of 722 other organizations.
Hugging Face is a platform that many LLM professionals use as a source for tools and other resources for LLM projects.
The company's main offerings include Transformers, an open source library that offers APIs and tools for downloading and tuning pretrained models.
The company hosts - in GitHub-like fashion - more than 500,000 AI models and 250,000 data sets, including those from Meta, Google, Microsoft, and VMware.
It lets users post their own models and data sets to the platform and to access those from others for free via a Hugging Face API. The company has raised some $235 million so far from investors that include Google and Nvidia.
As part of the exercise, the researchers in November 2023, tried to see if they could find exposed API tokens that they could use to access data sets and models on Hugging Face.
They scanned for exposed API tokens on GitHub and on Hugging Face.
With a small tweak to the scanning process, the researchers were successful in finding a relatively large number of exposed tokens, Lanyado says.
Lasso researchers were able to access tokens belonging to several top technology companies - including those with a high level of security - and gain full control over some of them, Lanyado says.
Lasso security researchers found a total of 1,976 tokens across both GitHub and Hugging Face, 1,681 of which turned out to be valid and usable.
As many as 655 of the tokens that Lasso discovered had write permissions on Hugging Face.
The researchers also found tokens that granted them full access to 77 organizations using Meta-Lama, Pythia, and Bloom.
An attacker with write privileges could replace the existing models with malicious ones or create an entirely new malicious model in their name.
According to Lanyado, Lasso researchers found several tokens associated with Meta, one of which had write permissions to Meta Llama, and two each with write permissions to Pythia and Bloom.
The API tokens associated with Microsoft and VMware had read only privileges, but they allowed Lasso researchers to view all of their private data sets and models, he says.
Lasso disclosed its findings to all impacted users and organizations with a recommendation to revoke their exposed tokens and delete them from their respective repositories.
The security vendor also notified Hugging Face about the issue.
This Cyber News was published on www.darkreading.com. Publication date: Mon, 04 Dec 2023 21:50:31 +0000