First, the agents were able to find new risks in the test environment – but that does not mean that they can find all types of risks in all types of environment. In the simulations conducted by the researchers, the AI agents were shooting fish in a barrel. These may have been new species of fish, but they knew, in general, what fish looked like. “We have not found evidence that these agents can detect new types of risk,” Kang said.
LLMs can find new uses for common risks
Instead, agents found new examples of more common types of vulnerabilities, such as SQL injections. “Large language models, while advanced, are not yet capable of fully understanding or navigating complex environments independently without significant human guidance,” said Ben Gross, a security researcher at cyber security firm JFrog.
And there wasn’t much variation in the vulnerabilities tested, Gross says, they were web-based, and could be easily exploited because of their simplicity.
Source link