Research

The Grok-Bankrbot Morse-Code Drain Is the Coding-Agent CI Heist Coming Next

Aryaman BeheraMay 7, 20265 min read
The Grok-Bankrbot Morse-Code Drain Is the Coding-Agent CI Heist Coming Next

TL;DR

  • On May 4, 2026, an attacker posted a Morse-code-encoded reply on X. Grok decoded it as part of its summary, tagged the payments agent @bankrbot, and Bankrbot executed a transfer of ~3 billion DRB tokens (between $175K and $202K) out of a verified wallet (Cryptopolitan, OECD.AI).
  • Bankrbot had previously hardcoded a block on Grok-originated replies. A maintenance rewrite of the message-handling pipeline dropped the guardrail. The block was never reinstated; no regression test caught its absence.
  • The three architectural patterns that produced the incident are the same patterns that show up in workstation-agent and CI-agent attacks: encoded-payload carrier, dropped guardrail in a rewrite, and an agent-to-agent trust chain where one agent's output is the next agent's input.
  • This is a payments-agent incident, not a coding-agent incident. The mechanism transfers cleanly anyway — and the Comment and Control disclosure on Claude Code, Gemini CLI, and Copilot Agent already showed two of the three patterns landing on production AI coding agents.

The most useful incidents for a security team are not the ones in your own stack. They're the ones whose mechanism transfers cleanly to your stack while you still have time to do something about it.

The May 4 Grok–Bankrbot wallet drain is one of those.

What happened#

The short version: a user replied to a Bankrbot-related thread on X with a message that included a Morse-code-encoded instruction. Grok was tagged in the surrounding thread and produced a summary that decoded the Morse-code segment as part of its normal "make sense of this content" behavior. The decoded instruction tagged @bankrbot — a payments agent that executes transfers via X mentions — and Bankrbot, reading the Grok-summary as input, processed the transfer. Approximately 3 billion DRB tokens, valued between $175,000 and $202,000 at the time, left a verified wallet to attacker-controlled destinations.

The reason Bankrbot processed a Grok-originated message at all is that a previously hardcoded block on Grok-originated replies had been dropped in a maintenance rewrite. The block existed precisely because the development team had anticipated that Grok-decoded content could be adversarial. The rewrite removed the conditional. No test caught it. The first instance of the missing block running in production was the heist.

Three patterns, all already in your CI#

This is a payments-agent incident on a social network. The reason it matters for security teams running AI coding agents in CI is that the mechanism — not the surface — transfers.

Pattern 1: Encoded-payload carrier. Morse code is a low-entropy alphabet that any LLM trained on plain English will decode automatically as part of producing a coherent summary. The asymmetry is that the human reviewer of a tweet does not decode Morse code in their head; the LLM does. This is the same asymmetry behind the HTML-comment payload variant of Comment and Control: the rendering surface (Markdown) hides the comment from humans, the agent (Copilot Agent) sees the full text. Encoded payloads are the entire family — base64 in committed files, hidden Unicode tags, zero-width characters in PR descriptions, hex strings in skill marketplace files. Every member of the family exploits the same asymmetric-perception bug.

Pattern 2: Guardrail dropped in a rewrite. Bankrbot's hardcoded Grok-block is the kind of mitigation that reads, in a code review, like a security wart someone added and forgot to clean up — exactly the kind of thing a refactor naturally targets for removal. There was no test that would have failed when it was removed. The first failing test ran in production with $175K of customer wallet at stake.

The CI-agent equivalent is right there: every prompt template that has been hardened against a specific attack pattern (input segregation, denylist of known-bad delimiters, HTML-comment scrubber) is a future refactor away from being silently dropped. If the only thing keeping the attack out is a single line in the prompt template, that line will eventually be deleted by someone who doesn't know why it was there. The fix is regression tests against known-bad payloads on every release of the agent runtime, exactly the kind of test we recommended for Comment and Control.

Pattern 3: Agent-to-agent trust chain. Bankrbot trusted Grok. Grok trusted X content. The attacker controlled X content. The transitive trust path was: untrusted social-network input → trusted summarization agent → trusted payments agent → executed transaction.

The CI-agent equivalent is everywhere. A code-review agent reads PR metadata controlled by an external contributor. An MCP-enabled agent calls a third-party MCP server whose output it folds into its context. A multi-step agent invokes a sub-agent whose output it consumes without revalidation. Every agent-to-agent boundary is a place where adversary-controlled content can flow upstream into a privileged context. Our cluster on workstation agent security keeps coming back to this; the Grok-Bankrbot incident is what it looks like when the chain ends in an actual money transfer rather than a credential leak.

What to do today#

Three layers, ordered by leverage:

  1. Decode-then-evaluate every input. If the agent will decode Morse, base64, hex, hidden Unicode, HTML comments, or any other alphabet, decode the input before it enters the prompt template, then re-evaluate the decoded payload against your input policy. This is a runtime change, not a prompt change.
  2. Regression tests against known-bad payload classes on every release. Comment and Control variants 1, 2, 3 are public payloads. Morse code, base64, zero-width characters are all trivially testable. If the test fails, the rewrite that just removed your hardening fails CI.
  3. Treat agent-to-agent boundaries as untrusted. When an agent consumes another agent's output, the consuming agent's prompt template should mark that output as adversary-controllable, exactly the way it should mark user input. This sounds paranoid until you remember Bankrbot.

Where this is going#

The Grok-Bankrbot incident is one of the first widely-reported cases of an agent-to-agent trust chain ending in a six-figure loss. The mechanism is portable. The same chain ending in a GITHUB_TOKEN leak from a CI agent is a smaller dollar number on a much larger blast radius — every repo the token can reach. We've spent the last year publishing on the workstation-agent and CI-agent versions of this pattern (Comment and Control teardown, the 2026 vibe-coding CVE list, malicious OpenClaw skill teardowns, the workstation agent security cornerstone) — the social-network surface is just the first one to be broadly exploited.

If you run AI agents in CI, treat the Grok-Bankrbot incident as a preview, not a curiosity. The patterns are already in your stack.

Frequently asked questions#

What happened in the Grok-Bankrbot Morse-code attack?

On May 4, 2026, an attacker on X posted a reply containing a Morse-code-encoded instruction. Grok decoded the message in its summary, tagged @bankrbot, and Bankrbot executed a transfer of approximately 3 billion DRB tokens — between $175,000 and $202,000 — out of a verified wallet. Bankrbot had previously hardcoded a block on Grok-originated replies, but a maintenance rewrite dropped that guardrail.

Is the Grok-Bankrbot incident relevant to workstation agent security?

Indirectly. The specific incident is on a payments agent on a social network, not on a workstation coding agent. But the three architectural patterns are identical: an encoded-payload carrier, a guardrail dropped in a rewrite, and an agent-to-agent trust chain where one agent's output becomes another agent's input without re-validation. Every CI-resident coding agent exposed to MCP servers, GitHub Actions, or skill marketplaces has the same shape.

What is the encoded-payload pattern in agent attacks?

An encoded-payload attack uses an alphabet the agent decodes and acts on, but the human reviewer does not. Morse code is one example; others include base64, hidden Unicode tags, HTML comments invisible in rendered Markdown but visible to LLM context windows, and zero-width characters in PR descriptions. The mechanism is asymmetric perception — the human sees plain text, the agent sees a complete instruction.

How do you defend against the Grok-Bankrbot pattern in a coding-agent context?

Three layers. (1) Decode-then-evaluate: decode every encoded input before it enters the prompt template, then re-evaluate. (2) Regression tests against known-bad payload classes on every release. (3) Treat agent-to-agent boundaries as untrusted: an agent consuming another agent's output should mark it as adversary-controllable.

Where Repello fits#

Repello's ARTEMIS red-teaming framework carries encoded-payload payload batteries (Morse, base64, hidden Unicode, HTML comments, zero-width characters) and agent-to-agent trust-chain test suites. If you run AI agents in CI or any workstation-resident coding agent, the test suite covers exactly the pattern this incident demonstrates.