Lasso Security researchers discovered 1,681 Hugging Face API tokens exposed in code repositories, which left vendors such as Google, Meta, Microsoft and VMware open to potential supply chain attacks.
In a blog post published Monday, Lasso Security said the exposed API tokens gave its researchers access to 723 organizations' GitHub and Hugging Face repositories, which contained high-value data on large language models and generative AI projects.
Hugging Face, a data science community and development platform, says it hosts more than 500,000 AI models and 250,000 data sets.
According to Lasso Security, the exposed API tokens left organizations' GenAI models and data sets open to a variety of threats, including supply chain attacks, poisoning of training data and theft of models.
Bar Lanyado, security researcher at Lasso, wrote that 655 organizations' tokens had write permissions, which gave the researchers full access to the repositories.
Some of the repositories that were open to full access were for platforms and LLMs such as the open source Meta Llama 2, EleutherAI's Pythia and BigScience Workshop's Bloom.
In a statement to TechTarget Editorial, Hugging Face said all exposed API tokens have been revoked, but the company appeared to put the blame primarily on customers.
Lanyado wrote that Hugging Face bears responsibility as well, and recommended that it continually scan for exposed API tokens and either revoke them directly or notify users.
Lanyado credited several organizations with fast responses to Lasso Security's findings.
Hugging Face said it is working on measures that will better prevent other exposures in the future.
Lanyado said the researchers ran into obstacles while searching code by regular expressions; the initial search produced only the first 100 results on GitHub.
The researchers then searched for HuggingFace API tokens regex for both users and org api tokens, which returned thousands of results.
Exposed API tokens were even more difficult to scan for on Hugging Face, Lanyado said, as the platform did not allow searches by regex.
Instead, the researchers searched for API tokens by substrings.
The researchers found another issue related to Hugging Face's org api tokens.
The company had previously deprecated those tokens and also blocked their usage in its Python library by checking the token type in the login function.
Even though the tokens had been deprecated, researchers found they could use exposed org api tokens to download private models from repositories.
Lanyado said researchers gained the ability to read and download a private LLM model from Microsoft.
Rob Wright is a longtime technology reporter who lives in the Boston area.
This Cyber News was published on www.techtarget.com. Publication date: Tue, 05 Dec 2023 20:13:05 +0000