I just finished up reading through the Anthropic's explanations of what they are calling the "first reported AI-orchestrated cyber espionage campaign". These words all sound big and scary, so I'm hoping to cut through some of the click-bait titles for anyone who may be interested. So I've written a two part blog on what happened, and why it matters. This will be Part I explaining what Anthropic tells us actually happened, and what to do about it. Click here for part two on why we should care about how we communicate these risks.
Given the titles of the blog post and extended report ("Disrupting the first reported AI-orchestrated cyber espionage campaign"), I imagine many people might read headlines and think "China is using AI to create and deploy totally autonomous zero-day attacks". Fortunately for us, this is not the case.
Broadly, what happened here was a Chinese threat group, GTG-1002, used Claude to orchestrate Model Context Protocol (MCP) servers to target "roughly 30 entities", with only "a handful of successful intrusions." Now, there are a few important distinctions and definitions in this sentence that I'll highlight here:
Source
SourceThe actual meat of this report should be extremely interesting for governance actors and cyber defense teams, but it is probably lacking enough specific information to turn into meaningful technical adjustments on the ground. For instance, were there any specific markers that cyber defenders may be able to use to detect this kind of behavior? How long did it take for defense teams to detect this behavior once they had a foothold? These kinds of details would go a long way to making a report like this more useful for the community. What concerns me more, though, is how this report (and its accompanying blog) frame the event, and how this framing will be consumed downstream in AI safety reporting.