Personal AI Agents Are the Most Privileged Software You've Ever Run

OpenClaw can read your files, send messages as you, run shell commands on your machine, and browse the web on your behalf. It’s always on, it accepts input from WhatsApp, Telegram, Slack, Discord, Signal, and a dozen other channels, and it hands all of that to whichever frontier LLM you configure. That’s not a sales pitch. It’s a description of the attack surface.

The project now has 288 published GitHub Security Advisories. One of them describes a 1-click path from a crafted link in a chat to remote code execution. Another class covers malicious skills delivering macOS infostealer malware. A third covers code execution triggered by running OpenClaw inside a cloned repository. None of these are theoretical. All of them have CVEs or GHSAs with patch references.

The security controls that close these gaps are well documented. OpenClaw publishes a MITRE ATLAS-aligned threat model, a formal TLA+/TLC verification suite, and a 10-point deployment checklist. Anthropic’s own computer-use tool, available on Claude Opus 4.6 and Sonnet 4.6 as computer_20251124, ships with the same requirements: sandboxed execution, prompt-injection detection, and mandatory user confirmation before consequential actions. The controls are not a mystery. Deployment hygiene is the gap.

The design that makes everything else matter

OpenClaw is built on a trusted-operator-only model. The documentation says plainly that “running one Gateway for multiple mutually untrusted/adversarial operators is not a recommended setup.” Session keys are routing selectors, not authorization tokens. There is no per-user host isolation in a shared gateway deployment.

That’s a deliberate design choice, not an oversight. OpenClaw is designed for a single trusted operator running a personal assistant on their own infrastructure. The problem is the gap between that design assumption and how teams actually deploy it: shared Slack bots, public gateway exposure via tunnel services, permissive tool policies that accumulate over time.

The blast radius of a compromise is proportional to what the agent can do. For OpenClaw with default tool groups enabled, that means shell execution, filesystem access, browser control, outbound web requests, and the ability to send messages in any connected channel. That’s a management plane, not a chat interface.

OpenClaw attack surface: from untrusted inputs to tool execution

Three incidents already in the record

CVE-2026-25253 is the clearest demonstration of blast radius in practice. OpenClaw’s Control UI accepted a gatewayUrl parameter from the query string and auto-connected to it, sending the stored gateway token during the WebSocket handshake. An attacker who receives that token can connect to the victim’s local gateway, modify tool policies, and trigger command execution. The loopback bind offered no protection because the browser initiates outbound connections. The attack required one click on a crafted link. Patched in 2026.1.29.

Trend Micro documented a separate class targeting the skills supply chain. A malicious productivity skill directed users to install “OpenClawCLI” from a lookalike site. The install chain included obfuscated Base64 shell that fetched a remote script and dropped a Mach-O binary. The binary harvested credentials, documents, and browser data before exfiltrating to a C2 endpoint. Trend Micro’s framing is exact: attackers shifted from social-engineering humans directly to using the AI agent as a trusted intermediary to trick humans.

The third incident, GHSA-99qw-6mr3-36qr, shows workspace context becoming an attack vector. OpenClaw previously auto-discovered plugins from .openclaw/extensions/ inside the current working directory without a trust step. Running OpenClaw inside a maliciously crafted cloned repository triggered code execution. Fixed in 2026.3.12.

Key security incidents: OpenClaw Nov 2025 to Mar 2026

The shared-bot failure nobody talks about

OpenClaw’s own documentation flags this clearly: a shared Slack bot where “everyone can message the bot” is a real risk because every allowed sender drives the same permission set. There is no per-user isolation below the agent session layer.

This means indirect prompt injection scales with how many people can reach the bot. Any allowed sender can submit a message containing injected instructions. If the agent has filesystem access and can send outbound messages, a successful injection exfiltrates data without further user interaction beyond whatever trust the attacker needs to send a message to the channel.

The threat model names this explicitly: “tool argument injection,” “exec approval bypass,” and “unauthorized command execution” are listed as core impact paths. Approval fatigue compounds the problem. Repeated confirmation prompts lead to rubber-stamping. Anthropic’s documentation on computer-use cites the same dynamic as a reason to prefer sandbox boundaries over repeated confirmation loops.

Anthropic’s computer-use tool as a reference design

Anthropic ships computer-use as a beta feature on Claude Opus 4.6, Sonnet 4.6, and Opus 4.5 under the computer_20251124 tool version. It provides screenshot capture, mouse and keyboard control, and a new zoom action for inspecting specific screen regions. The capability set is narrower than OpenClaw’s but the threat class is identical: an AI with the ability to interact with a desktop environment can do significant damage when pointed at injected instructions.

Anthropic’s published security requirements for computer-use list four items: use a dedicated VM or container with minimal privileges, avoid giving the model access to sensitive accounts or data, limit internet access to an explicit domain allowlist, and require human confirmation for any consequential action. Automatic prompt-injection classifiers run on screenshots, and when they detect potential injections they steer the model to ask for user confirmation before proceeding.

These are the same controls OpenClaw documents in its deployment checklist. The difference is that Anthropic enforces some of them at the API layer. OpenClaw leaves them to the operator.

The vendor spectrum is about isolation, not intelligence

NemoClaw, Nvidia’s alpha preview announced March 16, 2026, applies Landlock, seccomp, and network namespaces at the OS layer and enforces deny-by-default network and filesystem policies for OpenClaw workloads. OpenAI’s Agents SDK routes tool execution through hosted container runtimes. Amazon Bedrock Agents integrates with VPC endpoints and IAM for network-private enterprise deployments.

The security difference across these offerings is not model capability. It’s where the execution boundary sits and who controls it. Vendor-managed isolation moves that boundary up the stack, away from the operator’s deployment choices. The cost is cloud dependence and proprietary licensing. The benefit is that the sandbox doesn’t depend on an operator applying a checklist correctly on every machine where the agent runs.

Vendor isolation spectrum: from self-managed to cloud-managed agent runtimes

Ten controls. All of them are in OpenClaw’s own docs.

OpenClaw publishes a 10-point deployment checklist and a security audit command family. Running openclaw security audit --deep flags permission misconfigurations, exposed control planes, and tool policy drift. The checklist covers: dedicated environment (VM or separate OS user per trust boundary), loopback gateway bind with a strong auth token, strict DM policy with pairing or explicit allowlists, deny-by-default tool policy, egress allowlisting, skill hygiene (no unreviewed or unpinned skills, no curl|bash installers), locked-down filesystem permissions on ~/.openclaw, regular audit runs, and a credential rotation playbook for incident response.

The project also integrates VirusTotal scanning for skills published to ClawHub, with daily rescans and automatic blocking on detection. Snyk’s ToxicSkills analysis found a large fraction of marketplace skills with serious issues as of early 2026. VirusTotal scanning is one layer.

None of this is novel advice. All of it is in the project’s documentation. The gap is between documentation and deployment.

Defense in depth: 10 controls for secure OpenClaw deployment

The case against abstention

The obvious response to 288 advisories is to not run personal AI agents at all. That position misses the structure of the problem.

OpenAI’s Responses API routes tool calls through hosted containers with prompt-injection mitigations. Anthropic’s computer-use tool runs injection classifiers on every screenshot. Google’s Gemini computer-use guidance requires sandboxed VMs. These platforms carry the same threat class, managed by the same categories of control, at a different layer of the stack.

The choice is not between a safe platform and an unsafe one. It’s between owning the security configuration yourself and delegating it to a vendor. OpenClaw is unusually transparent about both the risks and the mitigations. The question is whether teams are applying the controls they already have documented.

Sources

OpenClaw security documentation and threat model (Trust, Gateway security, Security audit pages)
CVE-2026-25253 / GHSA: token exfiltration via gatewayUrl leading to gateway takeover and code execution; patched in OpenClaw 2026.1.29
GHSA-rqpp-rjj8-7wv8: scope binding flaw on shared-auth WebSocket connections; fixed in 2026.3.12
GHSA-99qw-6mr3-36qr: workspace plugin auto-discovery code execution; fixed in 2026.3.12
Trend Micro research on malicious OpenClaw skills distributing AMOS / Atomic macOS Stealer
Snyk ToxicSkills marketplace analysis, February 2026
Anthropic computer-use tool documentation, platform.claude.com; computer_20251124 tool version for Opus 4.6 and Sonnet 4.6
Nvidia NemoClaw early alpha preview, announced March 16, 2026

OpenClaw has 288 published GitHub Security Advisories. That number looks alarming. It’s also evidence of something most software projects lack: a security programme that publishes, tracks, and patches vulnerabilities transparently. The number of advisories is not the same as the number of exploitable vulnerabilities in a correctly deployed instance. The two are related, but not identical.

The framing of “OpenClaw is uniquely dangerous” misses the structure of the problem. Every computer-use tool, including Anthropic’s, carries the same threat class. Anthropic’s computer_20251124 tool for Claude Opus 4.6 and Sonnet 4.6 ships with explicit requirements for dedicated VMs, domain allowlists, and human confirmation for consequential actions, because any AI with desktop control is dangerous in the wrong environment. OpenClaw is not a special case. It’s the most transparent example of a problem that affects the whole category.

What 288 advisories actually means

The OpenClaw security advisory count reflects a fast-moving OSS project with active security triage, a formal threat model, TLA+/TLC verification of key properties, and a team that publishes findings rather than quietly patching them. Compare that to closed-source agent platforms where the security programme is invisible to external observers.

CVE-2026-25253 (1-click token exfiltration to RCE) was real, serious, and patched in 2026.1.29. GHSA-rqpp-rjj8-7wv8 (scope binding flaw on shared-auth WebSocket) was fixed in 2026.3.12. GHSA-99qw-6mr3-36qr (workspace plugin auto-discovery code execution) was fixed the same day. The pattern is: issue discovered, advisory published, patch shipped. That’s how responsible disclosure is supposed to work.

Vendor-managed platforms have these bugs too. They don’t always tell you about them.

The threat class is industry-wide, not OpenClaw-specific

Anthropic’s documentation for computer-use is explicit on this. The tool provides screenshot capture, mouse and keyboard control, and automated prompt-injection classifiers on every screenshot. The classifiers exist because prompt injection is a real and active threat for any AI that processes screen content from untrusted sources. Anthropic requires sandboxed execution, domain allowlisting, and human confirmation gates for the same reason OpenClaw’s own checklist does.

The delta between OpenClaw and vendor-managed compute-use platforms is operational, not categorical. NemoClaw applies OS-level sandboxing (Landlock, seccomp, network namespaces) around OpenClaw itself. OpenAI routes tool execution through hosted containers. Amazon Bedrock Agents integrates with VPC endpoints and IAM. These are all responses to the same problem: an AI with broad tool access in an insufficiently isolated environment is dangerous.

Picking a vendor-managed platform doesn’t exit the threat class. It moves the isolation responsibility from the operator to the vendor.

OpenClaw attack surface: from untrusted inputs to tool execution

The deployment gap is real, but it’s not unique to OpenClaw

The strongest version of the “OpenClaw is risky” argument is this: the deployment checklist exists, the audit tooling exists, and most teams don’t use either. That’s true. It’s also true of every piece of software with a security checklist.

OpenClaw’s own documentation is unusually honest about the mismatch between the design (single trusted operator, explicit permissions) and how teams tend to deploy it (shared Slack bot, broad tool groups, no egress controls). The documentation explicitly warns against shared bots in shared workspaces. The security audit command flags exposed control planes and permissive tool policies. The tools are there.

The gap between documentation and deployment is a cultural and organisational problem, not a product problem. Switching to a vendor-managed platform doesn’t solve the culture. It replaces one set of configuration responsibilities with another, at higher cost and with less transparency.

Key security incidents: OpenClaw Nov 2025 to Mar 2026

The supply chain risk is real, but scoped

Trend Micro’s documentation of malicious OpenClaw skills delivering AMOS malware is legitimate evidence of supply chain risk. Snyk’s ToxicSkills analysis from February 2026 found serious issues in a notable fraction of ClawHub marketplace skills. These findings are real.

OpenClaw’s response was also real: VirusTotal integration for ClawHub skill scanning, daily rescans, automatic blocking on detection, and stricter marketplace moderation. No scanning layer is a guarantee. A skill allowlist with version pinning, combined with review before installation, closes most of the gap.

The counterfactual matters here. Chrome extensions, VS Code plugins, npm packages, and Homebrew taps all have supply chain risk. The response to “malicious packages exist in package registries” is not to stop using package managers. It’s to audit what you install.

The honest case for vendor isolation

There is a real argument for preferring NemoClaw, Anthropic computer-use, or OpenAI’s hosted execution for certain deployment contexts. If the team responsible for the deployment doesn’t have the operational capacity to apply and maintain the 10-point checklist, OS-level sandboxing moves that responsibility to a party that does. If the use case involves sensitive credentials or access to financial systems, the blast radius calculation favours a more constrained execution environment regardless of configuration skill.

These are legitimate tradeoffs. They apply to a subset of deployments. For a single developer running OpenClaw on a dedicated machine with loopback-only gateway access, deny-by-default tool policy, and a reviewed skill list, the vendor isolation argument is weaker. The checklist is not that hard to apply.

Vendor isolation spectrum: from self-managed to cloud-managed agent runtimes

What actually needs to change

The security problem with personal AI agents is not the existence of vulnerabilities. Fast-moving OSS projects have vulnerabilities. The problem is the gap between the documented deployment posture and the default behaviour of most teams, who treat an agent with shell access like a chatbot and don’t read the security docs.

OpenClaw ships a formal threat model, an audit command, and a deployment checklist. Anthropic ships classifiers, confirmation gates, and explicit sandbox requirements. The controls exist across the ecosystem. The question is whether teams treat “agent with broad permissions” as a new category of infrastructure that requires the same security attention as a server with SSH access, or whether they treat it as a productivity app.

The advisories, the CVEs, and the malware delivery chains are evidence of what happens when it’s the latter.

Defense in depth: 10 controls for secure OpenClaw deployment

Sources

CVE-2026-25253 / GHSA: token exfiltration via gatewayUrl to gateway takeover and code execution; patched in OpenClaw 2026.1.29
GHSA-rqpp-rjj8-7wv8: scope binding flaw on shared-auth WebSocket connections; fixed in 2026.3.12
GHSA-99qw-6mr3-36qr: workspace plugin auto-discovery code execution; fixed in 2026.3.12
Trend Micro research on malicious OpenClaw skills distributing AMOS / Atomic macOS Stealer
Snyk ToxicSkills marketplace analysis, February 2026
Anthropic computer-use tool documentation, platform.claude.com; computer_20251124 for Opus 4.6 and Sonnet 4.6
Nvidia NemoClaw alpha preview, March 16, 2026
OpenClaw security documentation: Trust, Gateway security, Security audit pages

Teams running OpenClaw without a dedicated VM are not “at risk.” They are already behind. CVE-2026-25253, patched in January 2026, was a 1-click path from a link in a chat message to remote code execution. That’s not a future threat. It’s a patched vulnerability that should have changed every team’s deployment posture the day it was disclosed. Most haven’t changed anything.

OpenClaw now has 288 published GitHub Security Advisories. The patch cadence is fast. The blast radius per incident is high. Combining high-privilege tool access (shell, filesystem, browser, all messaging channels) with a trusted-operator-only security model in a shared-workspace deployment is not a configuration gap. It’s a design mismatch.

One click to root

CVE-2026-25253 is worth stating precisely because it’s easy to dismiss as “already patched.” OpenClaw’s Control UI accepted a gatewayUrl parameter from the query string and automatically connected to it, sending the stored gateway token during the WebSocket handshake. An attacker who receives the token can connect to the victim’s gateway, change tool policies, enable exec permissions, and run arbitrary commands under the victim’s account. The loopback bind is irrelevant because the browser initiates outbound connections. The attack chain: craft a link, send it to someone with an open Control UI tab, receive their token, own their machine.

Patched in 2026.1.29. Not the first high-severity advisory. Not the last.

Skills are an execution surface, not a feature list

Trend Micro documented malicious OpenClaw skills delivering AMOS (Atomic macOS Stealer) malware. The attack chain starts with a normal-looking productivity skill that directs the user to install a prerequisite from an external site. The install instructions use obfuscated Base64 shell to fetch a remote script and drop a Mach-O binary that exfiltrates credentials, documents, and browser data.

The reason this works is that skills run in the agent’s context. SKILL.md instructions are executable attack surface. A skill that says “install this dependency” is no different from a phishing email that says “download this update,” except the AI agent is the trusted intermediary lending credibility to the instruction.

Snyk’s ToxicSkills analysis from February 2026 found a significant fraction of ClawHub marketplace skills with serious security issues. VirusTotal scanning helps. It doesn’t substitute for a skill allowlist and version pinning.

OpenClaw attack surface: from untrusted inputs to tool execution

The shared-bot failure is not a misconfiguration

OpenClaw’s documentation explicitly warns that a shared workspace bot gives every allowed sender access to the same tool authority. That’s not a bug. It’s a documented limitation of the single-trusted-operator model applied to a multi-user environment.

Any allowed sender can trigger indirect prompt injection by submitting a message containing adversarial instructions. If the agent can read files and send messages, a successful injection exfiltrates data without any interaction from the legitimate operator. The attack doesn’t require credentials. It requires a message.

Teams running a company-wide Slack bot with broad tool access have accepted this risk, whether they know it or not. The fix is separate gateways per trust boundary, not a policy tweak on the shared instance.

Anthropic sets the floor on what “minimum security” means

Anthropic’s computer-use tool for Claude Opus 4.6 and Sonnet 4.6 (computer_20251124) provides screenshot capture, mouse and keyboard control, and automated prompt-injection detection on every screenshot. Anthropic’s deployment requirements are explicit: dedicated VM or container, no access to sensitive accounts, domain allowlist for internet access, human confirmation for consequential actions.

These requirements are not optional recommendations for enterprise customers. They’re the minimum viable security posture for running an AI with computer control. Anthropic enforces some of these at the API layer. The rest are deployment requirements.

If those controls are the floor for a narrowly scoped computer-use feature, they’re certainly the floor for an always-on personal agent with shell access and persistent messaging integrations. OpenClaw’s own documentation says the same thing. The controls are identical. The difference is enforcement.

Key security incidents: OpenClaw Nov 2025 to Mar 2026

Isolation is the control that actually works

NemoClaw, Nvidia’s alpha released March 16, 2026, wraps OpenClaw in OS-level sandboxing using Landlock, seccomp, and network namespaces. The policy model is deny-by-default with explicit allowlisting for new egress endpoints. This is the same approach Anthropic uses for hosted computer-use and OpenAI uses for hosted container execution.

The pattern across every serious implementation of agent security is the same: move the execution boundary to a layer the operator controls absolutely, then shrink it to the minimum needed for the task. Checklist items and approval prompts are not the same as sandbox boundaries.

OpenClaw’s 10-point deployment checklist describes this correctly. Dedicated VM per trust boundary, loopback-only gateway bind, deny-by-default tool policy, no unreviewed skills, locked-down filesystem permissions. The checklist is not ambiguous. Most deployments skip most of it.

Vendor isolation spectrum: from self-managed to cloud-managed agent runtimes

What “already behind” means in practice

Three scenarios that are actively happening:

A developer has the Control UI open in a browser tab. They click a link in Slack. Their gateway token is sent to an attacker server. The attacker connects to their local gateway, enables exec permissions, and runs a command that dumps SSH keys and API credentials to an external endpoint. This is CVE-2026-25253.

A team installs a productivity skill from ClawHub without reviewing the instructions. The skill directs the agent to install a helper tool. The install chain downloads and executes a binary. The binary exfiltrates documents and browser session tokens. This is the AMOS delivery chain documented by Trend Micro.

A developer clones a repository from an untrusted source and runs OpenClaw in that directory. The repository contains .openclaw/extensions/ with a malicious plugin. OpenClaw executes it. This is GHSA-99qw-6mr3-36qr, fixed in 2026.3.12.

All three required no exploit sophistication. All three had published advisories with patch references. All three would have been blocked by the deployment checklist OpenClaw ships with the product.

Defense in depth: 10 controls for secure OpenClaw deployment

The controls exist. Apply them.

The project’s audit tooling, openclaw security audit --deep, flags most common misconfigurations: exposed control planes, permissive tool policies, world-readable config directories, unreviewed plugins. Running it takes minutes. Fixing the findings takes an afternoon.

The alternative, switching to a vendor-managed platform, trades configuration responsibility for cloud dependence and pricing risk. Both are reasonable choices. Running OpenClaw with default settings in a shared workspace is not a reasonable choice. It’s a choice to accept all the risk and apply none of the controls.

Sources

CVE-2026-25253 / GHSA: token exfiltration via gatewayUrl leading to gateway takeover and code execution; patched in OpenClaw 2026.1.29
GHSA-rqpp-rjj8-7wv8: scope binding flaw on shared-auth WebSocket connections; fixed in 2026.3.12
GHSA-99qw-6mr3-36qr: workspace plugin auto-discovery code execution; fixed in 2026.3.12
Trend Micro research on malicious OpenClaw skills distributing AMOS / Atomic macOS Stealer
Snyk ToxicSkills marketplace analysis, February 2026
Anthropic computer-use tool documentation, platform.claude.com; computer_20251124 for Opus 4.6 and Sonnet 4.6
Nvidia NemoClaw alpha preview, March 16, 2026
OpenClaw security documentation: Trust, Gateway security, Security audit pages

Personal AI Agents Are the Most Privileged Software You've Ever Run

The design that makes everything else matter

Three incidents already in the record

The shared-bot failure nobody talks about

Anthropic’s computer-use tool as a reference design

The vendor spectrum is about isolation, not intelligence

Ten controls. All of them are in OpenClaw’s own docs.

The case against abstention

Sources

What 288 advisories actually means

The threat class is industry-wide, not OpenClaw-specific

The deployment gap is real, but it’s not unique to OpenClaw

The supply chain risk is real, but scoped

The honest case for vendor isolation

What actually needs to change

Sources

One click to root

Skills are an execution surface, not a feature list

The shared-bot failure is not a misconfiguration

Anthropic sets the floor on what “minimum security” means

Isolation is the control that actually works

What “already behind” means in practice

The controls exist. Apply them.

Sources

Comments

Leave a comment