Two independent research papers published this year confirm what the security community has been anticipating: self-replicating AI malware is no longer theoretical. The question is no longer whether AI-driven worms will target your infrastructure. It's whether your defences are designed to detect and contain them.

Two Separate Research Groups. The Same Conclusion.

In March 2026, researchers from Peking University, Tsinghua University, and Singapore Management University published ClawWorm, the first demonstrated self-replicating worm targeting a production-scale LLM agent ecosystem. Three months later, a team from the University of Toronto's CleverHans Lab, the Vector Institute, and the University of Cambridge published their own research: a self-replicating worm that carries its own AI reasoning engine, propagates across traditional IT infrastructure, and adapts its attack strategy for each machine it encounters.

Different attack surfaces. Different techniques. The same fundamental finding: AI agents give malware the capacity to reason, adapt, and spread without a human operator pulling the strings.

The Infrastructure Worm: BYO Language Model

The University of Toronto prototype (published to arXiv as AI Agents Enable Adaptive Computer Worms) is a network-level worm that carries a single-GPU open-weight language model within the malware itself. When it compromises a machine, it runs the model locally. No commercial API. No external platform. No content filter that can reach it.

The researchers tested the prototype across a 33-host isolated network running Linux servers, Windows machines, and IoT devices, configured with the kinds of vulnerabilities found routinely in corporate environments: reused credentials, unpatched software, common misconfigurations. Across 15 independent seven-day runs, the worm identified an average of 31.3 vulnerabilities per trial, escalated privileges on 23.1 hosts, and propagated to approximately two-thirds of the test network.

What makes this architecturally different from traditional worms is that it does not rely on a fixed exploit. It reads publicly available security advisories at runtime and generates working exploits from that information. During testing, it successfully exploited the Copy Fail and Dirty Frag Linux privilege escalation bugs, and a Marimo remote code execution vulnerability, reasoning its way to a working payload after reading the relevant advisories. When replica instances crashed due to a VM-detection bug, the parent worm identified the failing check, patched it out, and retried.

The economic argument in the paper is stark. Because the worm runs on the compute resources of its victims, the attacker's marginal cost per new infection is zero. The defender, by contrast, pays for every detection, investigation, and remediation. That asymmetry does not exist in traditional malware campaigns to the same degree.

The Agent Worm: Targeting the AI Stack Itself

ClawWorm attacks a different layer entirely. Rather than targeting traditional IT infrastructure, it targets LLM agent ecosystems: specifically OpenClaw, an open-source agent framework with more than 40,000 active instances that integrates with over 50 messaging platforms including Telegram, Discord, and WhatsApp.

The infection lifecycle runs in three autonomous phases. First, the worm uses a social engineering handshake to persuade the victim agent to modify its core configuration file (AGENTS.md), embedding the payload as a startup instruction. Second, because the framework loads that configuration file unconditionally at every session restart, the payload executes automatically with the agent's full tool privileges on every reboot. Third, an injected behavioural rule causes the infected agent to append the worm payload to its natural responses whenever it encounters new peers, spreading without any further attacker involvement.

Across 1,800 trials testing four frontier models, the researchers recorded a 64.5% aggregate attack success rate. The most effective propagation vector, exploiting the agent skill marketplace (ClawHub), achieved an 81% success rate and remained effective even against models with active safety filters. This is because the models universally grant implicit trust to third-party skill packages. A model might refuse a direct malicious instruction in a chat; it will not refuse to install a skill it retrieved from a marketplace.

The researchers also identified an "asymptomatic carrier" phenomenon. Kimi-K2.5, the most security-resistant model tested, successfully blocked payload execution, but still persisted the malicious configuration and passed it to 55% of newly encountered peers. A model that appears to be behaving safely can still be an active carrier.

The epidemiological modelling is the detail that warrants the most attention. Across all four models tested, the basic reproduction number (R0) exceeded 1.0, meaning exponential spread is mathematically guaranteed. For the most compliant model, R0 reached 4.20. For the most resistant, it was 2.00. Because the infection state is permanent (agents do not autonomously remove the malicious configuration), complete ecosystem saturation is inevitable without external human intervention.

Why Your Current Controls Don't Cover This

Both research teams reach the same conclusion on safety controls: they are structurally irrelevant to this class of threat.

The University of Toronto worm runs entirely on locally hosted open-weight models. Commercial platform guardrails (service refusals, content filtering, rate limiting) assume the model is accessed via an API the platform controls. That assumption does not hold here. Safety guardrails on open-weight models can be bypassed when the attacker controls the local execution environment.

ClawWorm exposes a different structural gap. Agent frameworks inherit the trust model of the LLM at their core, and that trust model was not designed for adversarial peer-to-peer messaging. Flat context trust means the agent cannot distinguish between a developer instruction and a message from an infected peer. Unconditional file loading means any configuration written to disk will execute. Unaudited skill supply chains mean marketplace packages are a reliable delivery mechanism.

Neither of these is a problem that can be solved by turning up a content filter.

What Defenders Need to Do Now

Neither research team is releasing their prototypes publicly. Both followed responsible disclosure processes. That buys some time, though not much, and it does not diminish the significance of the proof of concept.

There are concrete, practical steps organisations can take now.

Find your weaknesses before AI-assisted tools find them for you. The University of Toronto worm succeeded by exploiting exactly the kinds of vulnerabilities that structured penetration testing and red team exercises are designed to surface: reused credentials, unpatched software, predictable misconfigurations. The attack surface is not exotic. Get it mapped and addressed.

Treat your AI agent architecture as a security boundary, not just an application layer. If you are running LLM agents in your environment, whether for automation, customer-facing workflows, or internal tooling, the trust model those agents operate under is a security concern. Context privilege isolation, configuration integrity verification, zero-trust tool execution policies, and supply chain controls for agent skills are not theoretical hardening measures. They address the exact vectors ClawWorm used.

Build detection for AI-driven threat behaviour. Traditional malware leaves predictable signatures. An AI-driven worm adapts its payloads and attack logic at runtime. Signature-based detection will miss it. Organisations need to invest in behavioural monitoring, network segmentation, and anomaly detection that can identify lateral movement even without a fixed exploit pattern to match against.

Do not treat open-weight model deployments as lower risk than commercial APIs. The absence of a commercial provider's safety layer is a risk factor, not a neutral property. Any deployment of a locally hosted open-weight model in an environment with network access should be scoped and controlled accordingly. Irrespective of model type, we recommend that customers implement an independent AI guardrail capability in addition to any built-in safety layer.

The Shift This Represents

Traditional malware was defined by fixed exploit code. A worm like WannaCry spread by exploiting a specific, patchable vulnerability. Patch it, and the spread stops.

AI-driven worms are defined by the capacity to reason: to identify vulnerabilities dynamically, synthesise attack logic in real time, and adapt to observations. That is not patchable in the conventional sense. It requires a different defensive posture, one that assumes adversaries with adaptive reasoning capability, not just known exploit signatures.

Both research teams are right to frame this as a fundamental shift. Defenders who continue to plan against fixed-signature threats and assume that commercial AI safety controls cover their exposure are planning for the wrong adversary.

The research is public. The techniques are documented. The time to close the gaps is now.

SALTT Technologies helps organisations assess and address AI-specific security risk, including threat modelling for LLM deployments, AI agent security architecture review, and technical testing against AI-integrated environments. These fall within our AI Security and Technical Testing & Assurance capability areas.

Cybersecurity AI Security Threat Intelligence Malware LLM Security

SALTT Technologies

SALTT Technologies is an all-Australian cybersecurity consultancy working across security architecture, technical testing, AI security, governance and compliance, and managed cyber operations — helping Australian organisations understand their risk and act on it. saltt.tech

The Attacker's Marginal Cost Is Now Zero