The API tokens of tech giants Meta, Microsoft, Google, VMware, and more have been found exposed on Hugging Face, opening them up to potential supply chain attacks.
Researchers at Lasso Security found more than 1,500 exposed API tokens on the open source data science and machine learning platform - which allowed them to gain access to 723 organizations' accounts.
In the vast majority of cases, the exposed tokens had write permissions granting the ability to modify files in account repositories.
A total of 77 organizations were exposed in this way, including Meta, EleutherAI, and BigScience Workshop - which run the Llama, Pythia, and Bloom projects respectively.
The three companies were contacted by The Register for comment but Meta and BigScience Workshop did not not respond at the time of publication, although all of them closed the holes shortly after being notified.
Hugging Face is akin to GitHub for AI enthusiasts and hosts a plethora of major projects.
More than 250,000 datasets are stored there and more than 500,000 AI models are too.
The researchers say that if attackers had exploited the exposed API tokens, it could have led to them swiping data, poisoning training data, or stealing models altogether, impacting more than 1 million users.
In just their own work, the researchers say they were able to achieve the necessary access to modify 14 different datasets with tens of thousands of downloads per month.
Data poisoning attacks of this kind are among the most critical threats facing AI and ML as their prominence grows, Forcepoint says.
Google's anti-spam filters for Gmail are effective because of the reliably trained models that power the feature, but these have been compromised on a number of occasions in the past to push seemingly benign malicious emails into users' inboxes.
Another hypothetical scenario in which data poisoning could have a serious organizational impact is if the dataset that designates different types of network traffic were to be sabotaged.
Partially redacted spreadsheet showing the number of high-value organizations impacted by the exposed APIs on Hugging Face - image courtesy of Lasso Security - Click to enlarge.
The exposed API tokens were discovered by researchers conducting a series of substring searches on the platform and manually collecting them.
They then used the whoami Hugging Face API to determine whether the token was valid, who owned it, the owner's email, what organizations the owner belongs to, and the token's permissions.
Exposing API tokens is often done when developers store the token in a variable for use in certain functions, but forget to hide it when pushing the code to a public repository.
GitHub has its Secret Scanning feature to prevent leaks like this and is available to all users free of charge, and Hugging Face runs a similar tool that alerts users to exposed API tokens which are hardcoded into projects.
While investigating the exposed secrets on Hugging Face, researchers also found a weakness with its organization API tokens, which had already been announced as deprecated, that could be used for read access to repositories, and billing access to a resource.
It was also blocked in Hugging Face's Python library by adding a check to the type of token in the login function.
Lasso Security says all the affected organizations were contacted and the major companies like Meta, Google, Microsoft, and VMware responded on the same day, revoking the tokens and removing the code from their respective repositories.
This Cyber News was published on go.theregister.com. Publication date: Mon, 04 Dec 2023 14:43:05 +0000