llms-powered exploit generation hits scale: Sean Heelan's tests with Opus 4.5 and GPT-5.2 produced over 40 exploits across six scenarios targeting QuickJS.

XAI joins SpaceX to blend AI with aerospace engineering, enabling embedded workflows, shared compute, and flight-test data while boosting safety governance.
Read next in Artificial Intelligence →ChartGPU's WebGPU-powered charts render 1 million points at 60fps in the browser, demonstrating GPU-first data viz potential and production challenges.
Waymo robotaxi incident in Santa Monica triggers scrutiny as NHTSA and NTSB investigate safety measures after a child was injured near a school.
Palantir's Gotham and Foundry feed Medicaid data into ICE analytics, sparking privacy and governance debates over data fusion, transparency, and enforcement risk.
Industrialisation of Exploit Generation with LLMs: Sean Heelan's Warning
Sean Heelan's blog offers a blunt, practical warning: with today’s language models and execution environments, the industrialisation of exploit generation is not fiction so much as an emerging capability. Heenan’s piece, drawn from an experiment that built agents atop Opus 4.5 and GPT-5.2, challenged those agents to craft exploits for a zeroday vulnerability in the QuickJS Javascript interpreter. The results are blunt and consequential: the team produced over 40 distinct exploits across six scenarios, GPT-5.2 solved every scenario, and Opus 4.5 solved all but two. The takeaway is a clear signal that the tooling for offensive cybersecurity is becoming repeatable at industry scale, not a one-off lab exercise. On the Coming Industrialisation of Exploit Generation with LLMs
Behind the numbers lies a tight, precarious setup. The experiments used agents built on Opus 4.5 and GPT-5.2, with a range of mitigations and constraints designed to mimic real-world guardrails: for example assuming an unknown heap starting state and forbidding hardcoded offsets in exploits. The objectives were varied as well, spanning from spawning a shell to writing a file or establishing a back-channel to a command and control server. The ability of a modern language model to satisfy these diverse goals across multiple environments highlights how far automated exploit work can travel from idea to repeatable practice. For readers who want the hardware details, the QuickJS Javascript interpreter is the zeroday target in these experiments, and the work centers on building reliable exploits despite mitigations. QuickJS is the focus, and the codebase behind the experiment lives in a GitHub repository used to reproduce the results. QuickJS on GitHub
This isn't just a curiosity about a single interpreter. Sean Heelan's takeaway is the broader claim that we should prepare for the industrialisation of many of the constituent parts of offensive cyber security. If a pair of state-of-the-art models can generate dozens of working exploits for a single zeroday, the implication for defenders is concrete: the barrier to entry for offensive capability drops sharply as tooling becomes commodified. The pressure is not merely about writing a new exploit; it is about assembling end-to-end attack chains that can bypass mitigations, adapt to unknown memory layouts, and pivot through a network with minimal human intervention. As defenders, that means we need to raise the bar on secure-by-default systems, not just on per-issue patches. The blog’s write-up and the accompanying code mean this trend is already out there for others to study and improve upon. On the Coming Industrialisation of Exploit Generation with LLMs
From a defensive standpoint the stakes are high. Automated exploit generation raises concerns about supply chain security, browser and runtime vulnerabilities, and the stability of add-ons that rely on interpreters like QuickJS. If the same tooling can produce dozens of viable exploit paths in short order, defenders must lean into stronger memory safety guarantees, more aggressive fuzzing, and faster patching cycles. Still, there's reason for optimism: awareness and tooling can speed up defensive research. Public write-ups and reproducible experiments enable the security community to study failure modes, stress-test mitigations, and push for safer language runtimes. The practical takeaway for developers is simple: expect attackers to use automated, end-to-end workflows and design your systems with that pace in mind. TechCrunch and other industry coverage have been tracking similar shifts in AI assisted security in broader terms, which aligns with the trend Heelan documents here.