sdlcnext.com
← All posts
ai-agents security openclaw prompt-injection agentic-systems

Personal AI Agents Are the Most Privileged Software You've Ever Run

OpenClaw's 288 security advisories and a 1-click RCE show what happens when personal AI agents get broad tool access without matching security hygiene. The controls exist. Anthropic's own computer-use tool mandates most of them. Deployment is the gap.


Viewpoint

OpenClaw can read your files, send messages as you, run shell commands on your machine, and browse the web on your behalf. It’s always on, it accepts input from WhatsApp, Telegram, Slack, Discord, Signal, and a dozen other channels, and it hands all of that to whichever frontier LLM you configure. That’s not a sales pitch. It’s a description of the attack surface.

The project now has 288 published GitHub Security Advisories. One of them describes a 1-click path from a crafted link in a chat to remote code execution. Another class covers malicious skills delivering macOS infostealer malware. A third covers code execution triggered by running OpenClaw inside a cloned repository. None of these are theoretical. All of them have CVEs or GHSAs with patch references.

The security controls that close these gaps are well documented. OpenClaw publishes a MITRE ATLAS-aligned threat model, a formal TLA+/TLC verification suite, and a 10-point deployment checklist. Anthropic’s own computer-use tool, available on Claude Opus 4.6 and Sonnet 4.6 as computer_20251124, ships with the same requirements: sandboxed execution, prompt-injection detection, and mandatory user confirmation before consequential actions. The controls are not a mystery. Deployment hygiene is the gap.

The design that makes everything else matter

OpenClaw is built on a trusted-operator-only model. The documentation says plainly that “running one Gateway for multiple mutually untrusted/adversarial operators is not a recommended setup.” Session keys are routing selectors, not authorization tokens. There is no per-user host isolation in a shared gateway deployment.

That’s a deliberate design choice, not an oversight. OpenClaw is designed for a single trusted operator running a personal assistant on their own infrastructure. The problem is the gap between that design assumption and how teams actually deploy it: shared Slack bots, public gateway exposure via tunnel services, permissive tool policies that accumulate over time.

The blast radius of a compromise is proportional to what the agent can do. For OpenClaw with default tool groups enabled, that means shell execution, filesystem access, browser control, outbound web requests, and the ability to send messages in any connected channel. That’s a management plane, not a chat interface.

OpenClaw attack surface: from untrusted inputs to tool execution

Three incidents already in the record

CVE-2026-25253 is the clearest demonstration of blast radius in practice. OpenClaw’s Control UI accepted a gatewayUrl parameter from the query string and auto-connected to it, sending the stored gateway token during the WebSocket handshake. An attacker who receives that token can connect to the victim’s local gateway, modify tool policies, and trigger command execution. The loopback bind offered no protection because the browser initiates outbound connections. The attack required one click on a crafted link. Patched in 2026.1.29.

Trend Micro documented a separate class targeting the skills supply chain. A malicious productivity skill directed users to install “OpenClawCLI” from a lookalike site. The install chain included obfuscated Base64 shell that fetched a remote script and dropped a Mach-O binary. The binary harvested credentials, documents, and browser data before exfiltrating to a C2 endpoint. Trend Micro’s framing is exact: attackers shifted from social-engineering humans directly to using the AI agent as a trusted intermediary to trick humans.

The third incident, GHSA-99qw-6mr3-36qr, shows workspace context becoming an attack vector. OpenClaw previously auto-discovered plugins from .openclaw/extensions/ inside the current working directory without a trust step. Running OpenClaw inside a maliciously crafted cloned repository triggered code execution. Fixed in 2026.3.12.

Key security incidents: OpenClaw Nov 2025 to Mar 2026

The shared-bot failure nobody talks about

OpenClaw’s own documentation flags this clearly: a shared Slack bot where “everyone can message the bot” is a real risk because every allowed sender drives the same permission set. There is no per-user isolation below the agent session layer.

This means indirect prompt injection scales with how many people can reach the bot. Any allowed sender can submit a message containing injected instructions. If the agent has filesystem access and can send outbound messages, a successful injection exfiltrates data without further user interaction beyond whatever trust the attacker needs to send a message to the channel.

The threat model names this explicitly: “tool argument injection,” “exec approval bypass,” and “unauthorized command execution” are listed as core impact paths. Approval fatigue compounds the problem. Repeated confirmation prompts lead to rubber-stamping. Anthropic’s documentation on computer-use cites the same dynamic as a reason to prefer sandbox boundaries over repeated confirmation loops.

Anthropic’s computer-use tool as a reference design

Anthropic ships computer-use as a beta feature on Claude Opus 4.6, Sonnet 4.6, and Opus 4.5 under the computer_20251124 tool version. It provides screenshot capture, mouse and keyboard control, and a new zoom action for inspecting specific screen regions. The capability set is narrower than OpenClaw’s but the threat class is identical: an AI with the ability to interact with a desktop environment can do significant damage when pointed at injected instructions.

Anthropic’s published security requirements for computer-use list four items: use a dedicated VM or container with minimal privileges, avoid giving the model access to sensitive accounts or data, limit internet access to an explicit domain allowlist, and require human confirmation for any consequential action. Automatic prompt-injection classifiers run on screenshots, and when they detect potential injections they steer the model to ask for user confirmation before proceeding.

These are the same controls OpenClaw documents in its deployment checklist. The difference is that Anthropic enforces some of them at the API layer. OpenClaw leaves them to the operator.

The vendor spectrum is about isolation, not intelligence

NemoClaw, Nvidia’s alpha preview announced March 16, 2026, applies Landlock, seccomp, and network namespaces at the OS layer and enforces deny-by-default network and filesystem policies for OpenClaw workloads. OpenAI’s Agents SDK routes tool execution through hosted container runtimes. Amazon Bedrock Agents integrates with VPC endpoints and IAM for network-private enterprise deployments.

The security difference across these offerings is not model capability. It’s where the execution boundary sits and who controls it. Vendor-managed isolation moves that boundary up the stack, away from the operator’s deployment choices. The cost is cloud dependence and proprietary licensing. The benefit is that the sandbox doesn’t depend on an operator applying a checklist correctly on every machine where the agent runs.

Vendor isolation spectrum: from self-managed to cloud-managed agent runtimes

Ten controls. All of them are in OpenClaw’s own docs.

OpenClaw publishes a 10-point deployment checklist and a security audit command family. Running openclaw security audit --deep flags permission misconfigurations, exposed control planes, and tool policy drift. The checklist covers: dedicated environment (VM or separate OS user per trust boundary), loopback gateway bind with a strong auth token, strict DM policy with pairing or explicit allowlists, deny-by-default tool policy, egress allowlisting, skill hygiene (no unreviewed or unpinned skills, no curl|bash installers), locked-down filesystem permissions on ~/.openclaw, regular audit runs, and a credential rotation playbook for incident response.

The project also integrates VirusTotal scanning for skills published to ClawHub, with daily rescans and automatic blocking on detection. Snyk’s ToxicSkills analysis found a large fraction of marketplace skills with serious issues as of early 2026. VirusTotal scanning is one layer.

None of this is novel advice. All of it is in the project’s documentation. The gap is between documentation and deployment.

Defense in depth: 10 controls for secure OpenClaw deployment

The case against abstention

The obvious response to 288 advisories is to not run personal AI agents at all. That position misses the structure of the problem.

OpenAI’s Responses API routes tool calls through hosted containers with prompt-injection mitigations. Anthropic’s computer-use tool runs injection classifiers on every screenshot. Google’s Gemini computer-use guidance requires sandboxed VMs. These platforms carry the same threat class, managed by the same categories of control, at a different layer of the stack.

The choice is not between a safe platform and an unsafe one. It’s between owning the security configuration yourself and delegating it to a vendor. OpenClaw is unusually transparent about both the risks and the mitigations. The question is whether teams are applying the controls they already have documented.


Sources

  • OpenClaw security documentation and threat model (Trust, Gateway security, Security audit pages)
  • CVE-2026-25253 / GHSA: token exfiltration via gatewayUrl leading to gateway takeover and code execution; patched in OpenClaw 2026.1.29
  • GHSA-rqpp-rjj8-7wv8: scope binding flaw on shared-auth WebSocket connections; fixed in 2026.3.12
  • GHSA-99qw-6mr3-36qr: workspace plugin auto-discovery code execution; fixed in 2026.3.12
  • Trend Micro research on malicious OpenClaw skills distributing AMOS / Atomic macOS Stealer
  • Snyk ToxicSkills marketplace analysis, February 2026
  • Anthropic computer-use tool documentation, platform.claude.com; computer_20251124 tool version for Opus 4.6 and Sonnet 4.6
  • Nvidia NemoClaw early alpha preview, announced March 16, 2026

Comments

Loading comments…

Leave a comment