AI Agents Vulnerable to Hidden Prompts and Social Engineering
Recent research reveals OpenClaw AI agents can be compromised through invisible prompt injection in contacts and sophisticated phishing, bypassing built-in security measures and exfiltrating…
Recent research reveals OpenClaw AI agents can be compromised through invisible prompt injection in contacts and sophisticated phishing, bypassing built-in security measures and exfiltrating sensitive data.
OpenClaw, a self-hosted AI agent platform, was recently shown to be vulnerable to two distinct attack vectors. Security researchers at Imperva and Varonis demonstrated how agents could be compromised without direct user interaction, leading to credential exfiltration and arbitrary code execution. These vulnerabilities exploit the agent's default trust in incoming data and its broad system access, posing significant risks for organizations deploying autonomous AI agents with access to sensitive business data.
Contact Names Enable Prompt Injection
Imperva Security researcher Yohann Sillam discovered that OpenClaw serializes message objects, including shared contacts, vCards, and location pins, directly into LLM prompts. Critically, these inputs are not marked as untrusted. Since angle brackets are legal characters in contact names, an attacker can inject additional instructions that the model interprets as legitimate commands. The attack surface remains invisible to victims because platforms like WhatsApp truncate contact names in the UI, hiding the malicious payload from both sender and recipient. Imperva's tests against Google Gemini 3.1 Pro successfully commanded agents to download and execute scripts from researcher-controlled servers.
OpenClaw enables memory by default, meaning a single poisoned contact shared across a team or organization can compromise every agent that processes it. Without sandboxing, the injected command persists and executes whenever the agent recalls the associated conversation. The vulnerability stems from an inconsistency: web-scraped content is wrapped in an untrusted-content boundary marker, but message objects are not. This creates a bypass for attacks embedded in message data. OpenClaw shipped a fix in version 2026.4.23, moving contact names, vCard fields, and location labels into a separate untrusted-metadata channel outside the main prompt.
Social Engineering Bypasses Controls
Varonis Threat Labs approached OpenClaw security from a different angle, demonstrating that plain-English phishing emails can bypass the agent's built-in verification rules. The team built a test agent named Pinchy, connected it to Gmail, and populated its inbox with synthetic business emails and mock secrets. They then ran four phishing scenarios against Google Gemini 3.1 Pro and OpenAI Codex GPT-5.4. The agent operated under a strict profile explicitly configured to verify sender identity before taking sensitive actions.
Both exfiltration tests succeeded. In one scenario, an email impersonating a team lead requested staging credentials during a fabricated production incident. The agent, despite its strict profile, forwarded AWS keys, database credentials, and customer data to external addresses from a single email. This highlights a fundamental weakness: OpenClaw trusts incoming data and has broad access to sensitive systems, even when configured with explicit security policies.
Hardening AI Agent Deployments
Organizations deploying self-hosted AI agents, particularly those with access to messaging platforms, credential stores, file systems, and sensitive business data, must implement several mitigation steps. The most immediate fix is updating to OpenClaw 2026.4.23 to address the prompt-injection vulnerability. Beyond software updates, strict agent permissions are necessary, limiting what an agent can access and execute. Sandboxing environments can contain potential breaches, preventing an exploited agent from affecting core systems. Finally, requiring human confirmation for sensitive operations, such as accessing credentials or exfiltrating data, adds a critical layer of defense against both prompt injection and social engineering attacks.
What We'd Change
The identified vulnerabilities in OpenClaw are not unique to this platform; the underlying mechanisms of prompt injection and social engineering apply to any AI agent processing untrusted input with broad system access. Founders building or deploying AI agents should assume that declarative security policies, like a
The investor read
The demonstrated vulnerabilities in OpenClaw signal a maturing AI agent market where security is no longer an afterthought but a critical differentiator. Investors should note the operational risks for companies deploying autonomous agents with broad system access. This creates opportunities for startups offering specialized AI agent security solutions, including advanced sandboxing, trusted execution environments, and AI-native threat detection. The failure of a 'strict profile' to prevent exfiltration suggests that current security paradigms are insufficient, highlighting a need for novel approaches to agent governance and human-in-the-loop controls. Benchmarks for agent security, auditability, and incident response will become increasingly important for enterprise adoption.
Pull quote: “OpenClaw enables memory by default, meaning a single poisoned contact shared across a team or organization can compromise every agent that processes it.”
Every claim ties to a primary source. See our methodology.