We're excited to share that Trail of Bits has been selected as one of the seven exclusive teams to participate in the small business track for DARPA's AI Cyber Challenge.
Our team will receive a $1 million award to create a Cyber Reasoning System and compete in the AIxCC Semifinal Competition later this summer.
Our involvement in the AIxCC represents a step forward in our commitment to pushing the boundaries of what's possible, envisioning a future where cybersecurity challenges are met with innovative, AI-powered solutions.
Disclaimer: Information about AIxCC's rules, structure, and events referenced in this document are subject to change.
The guiding principles for building our CRS. In addition to competing in the AIxCC's spiritual predecessor, the Cyber Grand Challenge, our team at Trail of Bits has been working to apply AI/ML techniques to critical cybersecurity problems for many years.
DARPA's CGC, like the AIxCC, tasked competitors with developing CRSs that find vulnerabilities at scale without any human intervention.
The CRS Trail of Bits created to compete in the CGC, Cyberdyne, addressed these problems with a distributed system architecture.
Each node was tasked with one or more challenge problems, and could even cooperate with other nodes on the same challenge.
If nodes experienced a catastrophic error while analyzing a challenge problem, the operation of other independent nodes was not affected, limiting the damage to the CRS's overall score.
The format of the AIxCC bears a strong resemblance to that of the CGC, so the CRS we build for the AIxCC will also need to be scalable and resilient to failures.
The AIxCC has an additional wrinkle-challenge diversity.
The AIxCC's challenge problem set will include programs written in languages other than C/C++, including many interpreted languages such as Java and Python.
The distributed architecture used in Cyberdyne can be adapted for the AIxCC to address versatility in a manner similar to scalability and resiliency.
The key difference is that problem-solving nodes used for AIxCC challenges will need to be specialized for different types of challenge problems.
In the context of the AIxCC, our experience suggests that an AI/ML-only approach is a losing proposition due to high compute costs and the effect of compounding false positives, inaccuracies, and/or confabulations at each step.
Among the tasks a CRS must complete in the AIxCC that are suitable for AI/ML, several are tailor-made for LLMs, such as generating code snippets and seed inputs for fuzzing.
We used this framework to assess different LLMs' abilities to handle several distinct tasks, including those highly relevant to AIxCC. We found that LLMs could perform only as well as experts or significantly upskill novices for tasks that were reducible to natural language processing, such as writing phishing emails and conducting misinformation campaigns.
For other cyber tasks such as creating malicious software, finding vulnerabilities in source code, and creating exploits, current-generation LLMs had novice-like capabilities and could only marginally upskill novice users.
Because LLMs will struggle greatly with tasks that are reasoning-intensive, such as identifying novel instances of vulnerabilities in source code or classifying vulnerabilities, we'll avoid their use in our CRS. Other types of AI/ML models with narrower scopes are a better option.
Next month, DARPA will hold its AIxCC kickoff event where we should learn more about the infrastructure DARPA will provide for the competition.
This Cyber News was published on securityboulevard.com. Publication date: Mon, 11 Mar 2024 21:13:07 +0000