Backslash Security says LLM-generated code brings risk - image credit - Licensed under the Unsplash+ LicenseBackslash Security has been looking at the security risks of LLM-generated code. It built a developer simulation exercise that it used against GPT-4, which identified security blind spots in the code that it generated.

Shahar Man, co-founder and CEO of Backslash Security, said, “The way we create code is rapidly changing, and that means the way that we secure code must also change. AI-generated code offers immense possibility, but also introduces an entirely new scale of security challenges – and application security teams now bear the burden of securing an unprecedented volume of potentially vulnerable code due to the sheer speed of AI-enabled software development.

Shahar Man, co-founder and CEO of Backslash Security (image credit-LinkedIn/Shahar Man)
Shahar Man, co-founder and CEO of Backslash Security

“Our research shows that securing open source code is more critical than ever before due to product security issues being introduced by AI-generated code that is associated with OSS.” 

Why is letting AI generate code a problem?

Given the ability of an AI to learn any subject, it should come as no surprise that developers are experimenting with using it to write code. They are supported by a number of vendors who are delivering AI-enabled tools built around their own databases and other products. Both point to the potential time savings. With vendors saying it will help speed up software development and overcome skills shortages.

However, there is a growing concern over the quality of the code that AI delivers. Backslash quotes Gartner Research on using AI Code Assistants from March 2024 (Gartner login required). In that research, Gartner says that “63% of organizations are currently piloting or deploying AI code assistants.

“Due to its simplicity of use, AI-generated code will dramatically increase the pace of new code development. However, this technology introduces a diverse range of potential vulnerabilities and security challenges.”

Simulation testing exposed problems

To investigate further, the Backslash Research Team created and ran several developer simulations using GPT-4. It discovered:

  • Some LLMs can generate vulnerable OSS package recommendations due to being ‘frozen in time’: LLMs are often trained on static datasets with a fixed stop date. It means they lack details on many patch updates. It risks code being generated using known vulnerabilities.
  • ‘Phantom’ packages can introduce unseen reachable risks: LLM-generated code can include indirect OSS packages that developers are unaware of. This is because the code does not have a Software Bill of Materials (SBOM) listing all packages from which the code has come. It risks introducing untraceable risks.
  • Seemingly safe code-snippet outputs can create an illusion of trust. Experiments reveal that using the same prompt, GPT-4 generated different recommendations, occasionally suggesting vulnerable package versions. Disclaimers pointing out risk were not always provided, leading software teams to potentially view AI-generated code as reliable.

Enterprise Times: What does this mean?

The backlog of software projects means there is constant pressure to write code quickly. This is why AI is being touted as the next solution to save developers time. The problem is, unless the code is subjected to rigorous testing, there is no way to know if it can be trusted. And if it is tested and throws up significant remediation issues, is it really saving time?

Michael Beckley, Chief Technology Officer and co-founder, Appian (Image Credit: Appian)
Michael Beckley, Chief Technology Officer and co-founder, Appian

It is not just Gartner and Backslash that warn about the risks of AI-generated code. At Appian World, Michael Beckley, co-founder and CTO, told Enterprise Times, “Because we write code to build the platform, we’ve experimented heavily with using AI to assist our developers.

“We’ve discontinued most of those experiments right now because the models simply aren’t good enough. Just because the code might be syntactically correct, it may not be conceptually correct. And so we have not seen the kinds of efficiency gains that we were hoping for from writing code with AI.”


Please enter your comment!
Please enter your name here