opus-4-5 gpt-5-2 quickjs zero-day exploit-generation llms offensive-cybersecurity github defensive-security

LLMs Exploit Generation Industrialisation: Sean Heelan Warns

llms-powered exploit generation hits scale: Sean Heelan's tests with Opus 4.5 and GPT-5.2 produced over 40 exploits across six scenarios targeting QuickJS.

John ShelbiAI Systems & Policy Analyst

3 min read

LLMs Exploit Generation Industrialisation: Sean Heelan Warns

More in Artificial Intelligence

Waymo Robotaxi Incident in Santa Monica Sparks Regulatory Scrutiny

Waymo robotaxi incident in Santa Monica triggers scrutiny as NHTSA and NTSB investigate safety measures after a child was injured near a school.

Palantir Gotham and Foundry Feed Medicaid Data to ICE Privacy

Palantir's Gotham and Foundry feed Medicaid data into ICE analytics, sparking privacy and governance debates over data fusion, transparency, and enforcement risk.

Claude Code Swarms: Anthropic's Hidden Multi-Agent Coding Feature

Anthropic's Claude Code Swarms could transform coding by coordinating multiple agents, delivering faster iterations, safer outputs, and edge-case handling.

OpenAI Scales PostgreSQL to 800M ChatGPT Users with Citus

OpenAI scales PostgreSQL to 800M ChatGPT users with multi-region replication and Citus, proving reliability and low latency at global scale.

ChartGPU: WebGPU Charts Render 1 Million Points at 60fps

ChartGPU's WebGPU-powered charts render 1 million points at 60fps in the browser, demonstrating GPU-first data viz potential and production challenges.

Next in Artificial Intelligence

SpaceX xAI Joins Forces for AI Safety and Aerospace Collaboration

XAI joins SpaceX to blend AI with aerospace engineering, enabling embedded workflows, shared compute, and flight-test data while boosting safety governance.

Read next in Artificial Intelligence →

Experiment setup and tooling

Behind the numbers lies a tight, precarious setup. The experiments used agents built on Opus 4.5 and GPT-5.2, with a range of mitigations and constraints designed to mimic real-world guardrails: for example assuming an unknown heap starting state and forbidding hardcoded offsets in exploits. The objectives were varied as well, spanning from spawning a shell to writing a file or establishing a back-channel to a command and control server. The ability of a modern language model to satisfy these diverse goals across multiple environments highlights how far automated exploit work can travel from idea to repeatable practice. For readers who want the hardware details, the QuickJS Javascript interpreter is the zeroday target in these experiments, and the work centers on building reliable exploits despite mitigations. QuickJS is the focus, and the codebase behind the experiment lives in a GitHub repository used to reproduce the results. QuickJS on GitHub

Broader implications for defenders and researchers

This isn't just a curiosity about a single interpreter. Sean Heelan's takeaway is the broader claim that we should prepare for the industrialisation of many of the constituent parts of offensive cyber security. If a pair of state-of-the-art models can generate dozens of working exploits for a single zeroday, the implication for defenders is concrete: the barrier to entry for offensive capability drops sharply as tooling becomes commodified. The pressure is not merely about writing a new exploit; it is about assembling end-to-end attack chains that can bypass mitigations, adapt to unknown memory layouts, and pivot through a network with minimal human intervention. As defenders, that means we need to raise the bar on secure-by-default systems, not just on per-issue patches. The blog’s write-up and the accompanying code mean this trend is already out there for others to study and improve upon. On the Coming Industrialisation of Exploit Generation with LLMs

Defensive considerations and outlook

From a defensive standpoint the stakes are high. Automated exploit generation raises concerns about supply chain security, browser and runtime vulnerabilities, and the stability of add-ons that rely on interpreters like QuickJS. If the same tooling can produce dozens of viable exploit paths in short order, defenders must lean into stronger memory safety guarantees, more aggressive fuzzing, and faster patching cycles. Still, there's reason for optimism: awareness and tooling can speed up defensive research. Public write-ups and reproducible experiments enable the security community to study failure modes, stress-test mitigations, and push for safer language runtimes. The practical takeaway for developers is simple: expect attackers to use automated, end-to-end workflows and design your systems with that pace in mind. TechCrunch and other industry coverage have been tracking similar shifts in AI assisted security in broader terms, which aligns with the trend Heelan documents here.

Experiment setup and tooling

Broader implications for defenders and researchers

Defensive considerations and outlook

More in Artificial Intelligence

Waymo Robotaxi Incident in Santa Monica Sparks Regulatory Scrutiny

Palantir Gotham and Foundry Feed Medicaid Data to ICE Privacy

Claude Code Swarms: Anthropic's Hidden Multi-Agent Coding Feature

OpenAI Scales PostgreSQL to 800M ChatGPT Users with Citus

ChartGPU: WebGPU Charts Render 1 Million Points at 60fps

Get the post-read brief

SpaceX xAI Joins Forces for AI Safety and Aerospace Collaboration

Experiment setup and tooling

Broader implications for defenders and researchers

Defensive considerations and outlook

More in Artificial Intelligence

Waymo Robotaxi Incident in Santa Monica Sparks Regulatory Scrutiny

Palantir Gotham and Foundry Feed Medicaid Data to ICE Privacy

Claude Code Swarms: Anthropic's Hidden Multi-Agent Coding Feature

OpenAI Scales PostgreSQL to 800M ChatGPT Users with Citus

ChartGPU: WebGPU Charts Render 1 Million Points at 60fps

Get the post-read brief

SpaceX xAI Joins Forces for AI Safety and Aerospace Collaboration

Experiment setup and tooling

Broader implications for defenders and researchers

Defensive considerations and outlook