OpenClaw Took Two Attacks in a Week, and the Harder One Has No Patch

PREVIEWOpenClaw Took Two Attacks in a Week, and the Harder One Has No Patch · MD

OpenClaw took two attacks in a single week, and the harder of the two has no software patch. Two independent research teams, working from different starting points, both found ways to turn the agent's own capabilities into an attacker's primitive.

The pair of findings is the story. The more durable one is a mailbox test an outside team could almost certainly run again tomorrow, and the patch addresses the smaller half of the lesson.

The first disclosure came from security vendor Imperva, where researcher Yohann Sillam hid instructions inside shared contacts, vCards, and location pins, then watched OpenClaw execute them while the victim never saw the payload (The Hacker News). The trick fits on a phone screen. A name field in a contact card carries a few hundred characters of text. OpenClaw flattens that field into the language model's context as if it were the user's own instruction. The on-screen display truncates the field, so a person looking at the contact sees a name and a phone number, not a hidden command. The model sees the command. The model also has tools.

That concrete shape is why a fix exists. OpenClaw 2026.4.23 moves contact names, vCard fields, and location labels out of the prompt body and into a separate untrusted-metadata channel — treating them the same way it already treated fetched web content (The Hacker News). Imperva found the same flattening pattern in other personal AI assistants besides OpenClaw, suggesting the underlying problem is architectural rather than vendor-specific. Operators who self-host can apply that update today.

The second disclosure is the more lasting problem. Varonis Threat Labs, led by Itay Yashar, built a test agent named Pinchy on OpenClaw and gave it access to a Gmail inbox and an outbound email tool. They then ran four phishing simulations against Pinchy on OpenClaw, Google Gemini 3.1 Pro, and OpenAI Codex GPT-5.4 — comparing how each model-agent combination handled social pretexts. Both exfiltration tests targeted Pinchy on OpenClaw specifically. In the first, a message posing as a team lead named Dan, sent from an outside Gmail address, asked for staging access during a fake production incident. Pinchy found the credentials and forwarded mock AWS IAM access keys, database connection strings, and SSH credentials in plaintext. A second pretext — a routine request for the weekly customer export for a QBR deck — shipped a synthetic dataset of 247 enterprise customers, contacts, and contract values. Both failures happened under a strict profile that told the agent to verify senders first. The rule existed. Urgency beat it once; routine beat it the second time.

On the social pretext tests run across all three platforms, Varonis found that both Gemini 3.1 Pro and Codex GPT-5.4 also fell for the same email-based attacks — suggesting the susceptibility to urgency-and-routine-triggered credential forwarding is not unique to OpenClaw. On technical threats like gift-card phishing pages and suspicious OAuth consent screens, all three agents performed better, inspecting redirect targets and withholding credentials.

Varonis characterizes the OpenClaw finding as a design issue rather than a bug (The Hacker News): the test agent had access to a mailbox and an outbound email tool, and the attacker wrote a sentence. That combination is the vulnerability. There is no patch because there is nothing to patch in any one place. The fix is to scope what the agent is allowed to do on its own — what it can email, what it can fetch, what it can write to disk, what it can call as a script — and that scoping is a configuration and design decision the operator owns.

Both findings collapse to the same structural weakness. OpenClaw treats data that reaches it as trusted context: a contact card from a stranger, a calendar entry from an external sender, a ticket from an unknown customer, an email in the inbox. Each is flattened into the model prompt and read by the same reasoning engine that reads the user's instructions. Once that is true, every capability the agent has becomes a capability the attacker can borrow, because the attacker only needs the model to decide to use it.

That framing matters beyond OpenClaw. Any agent that inlines structured objects into LLM context without marking them as untrusted inherits the same exposure. The reason the message-object path worked where image-based injection failed is itself diagnostic. Frontier models have been trained on a long public history of image-based prompt injection, and the major providers have hardened against that surface. Models have seen far fewer examples of injection through contacts, vCards, pins, calendar entries, or support tickets, so the defensive training is thinner. The training gap, not the model, is the lesson for builders working on other agent platforms.

For self-hosted OpenClaw operators the action is concrete. Apply 2026.4.23 to close the Imperva path. Then, separately, audit what the agent is allowed to do autonomously: whether it can send outbound mail to addresses outside a known allowlist, fetch URLs from internal hosts versus the public internet, execute scripts, or export files. The Varonis class of attack is blunted by what the agent cannot do without a human in the loop, and that is a setting the operator controls, not a line in a release note.

The week produced two findings, one fix, and a sharper question. When an agent can read an email and send an email in the same session, who is the user, and who is the stranger in the inbox? In OpenClaw's case the agent does not know, and the configuration decides.

OpenClaw Took Two Attacks in a Week, and the Harder One Has No Patch — type0 | type0

OpenClaw Took Two Attacks in a Week, and the Harder One Has No Patch

Sources