Simon makes a very good point that AI is becoming similar to open source software in a way.
To remain nimble and leverage the work of great minds from around the world, companies will need to adopt it or spend a lot of time and money trying to achieve on their own what AI can achieve for them.
In May of 2023, Samsung banned ChatGPT because an employee uploaded some sensitive internal source code to the service.
While it may have been useful to the employee, OpenAI could retain that code and even train upcoming models on it.
Big companies like Amazon and Microsoft have policies about how to classify information and what information can be stored, transmitted, or processed outside the corporate network.
Training on those policies is both part of new hire orientation and periodic security refreshers.
Tip 2: An AI can't reveal what it doesn't know.
LLM's keep secrets about as well as toddlers do.
During the podcast, Simon mentioned a great example/trainer about prompt injection called Gandalf.
Simply put, do not throw mountains of unsanitized training data at your LLM. GitGuardian literally came to be because developers were leaking secrets in public GitHub repositories.
If a company trained an LLM on its private repositories, it's possible that an attacker could get the LLM to spit out anything from proprietary code to hard-coded secrets.
If a public or all-company facing LLM isn't trained on information you don't want shared, it can't share it.
Some LLMs have been trained on a ton of GitHub repositories.
While there's a lot of good code on Github, there's a lot of bad code, and most LLMs aren't smart enough to tell the difference.
According to Simon, this comes down to how the LLMs process things.
An LLM doesn't truly understand your question and it doesn't truly understand its answer.
The AI can't step through the code and tell you what the output of a specific variable would be under specific conditions.
It doesn't actually understand what the code will do.
If you're getting an AI to write code, you still need to inspect and test it.
Realize that there is exploit code and backdooring code and all sorts of other poisoned data in the average LLM's training data, and therefore while it may be very helpful, it cannot be trusted implicitly.
This Cyber News was published on securityboulevard.com. Publication date: Fri, 12 Jan 2024 11:43:05 +0000