The AI ​​security breach that no one wants to admit is here



On March 31, 2026, Anthropic accidentally pushed all of Claude Code’s source code to the public npm registry. About 512,000 lines of TypeScript in 1,906 fileswhich included 44 hidden feature flags and references to an unreleased model codenamed Mythos, was openly accessible in a Cloudflare storage repository until a security researcher found it and posted the link to X. Within hours, the code base had been reflected on GitHubracking up thousands of stars before Anthropic could issue DMCA takedowns. Anthropic called it a packaging error caused by human error. That explanation is accurate and also somewhat off base.

By exposing Claude Code’s blueprints, Anthropic provided a roadmap to anyone who wanted to design malicious repositories specifically designed to trick Claude Code into running commands in the background or extracting data before a user sees a trusted message. The permissions enforcement logic, the sandbox architecture, the exact orchestration mechanisms that govern how the agent validates what it’s allowed to do: all of this now sits permanently in the wild in tens of thousands of forked repositories that no DMCA notice will ever reach in its entirety. What the leak exposed about the state of AI security is more uncomfortable than the leak itself.

One side moves faster

The conventional framework around AI in cybersecurity treats it as a rough balance, an arms race where attack and defense accelerate together. That framework doesn’t hold up well with the details of what really happened in March, or with what security teams describe working on a day-to-day basis.

The exposed hook and permissions logic from the Claude Code leak make silent device takeover more reliable for attackers who know where to look. Meanwhile, defenders are integrating AI into existing security stacks and validating that it will not generate false positives before it is operationally useful. Those two schedules are not comparable.

Tim Burke, who has led managed security operations for more than 30 years at Search technology managementmakes the asymmetry clear. “The attackers obtained the complete blueprint of how an AI agent validates permissions and handles credentials without having to reverse engineer any of it.“, says.”That means attackers are operating with AI that moves faster than most detection systems were designed to handle, while security teams are still figuring out how to deploy AI tools without creating more work for themselves. SOC already overwhelmed.

Google threat intelligence group identified the first confirmed zero-day exploit developed entirely with AI assistance earlier this month and stopped a planned mass exploitation event before it could be executed, representing the optimistic version of this story. Most organizations defending against those same capabilities are not Google and their detection infrastructure was not designed for what is now possible.

Most organizations still run a detection infrastructure designed to detect human attackers methodically moving through networks over days or weeks.“says Burke.”AI has compressed those timelines to hours and, in some cases, minutes, meaning the window between intrusion and damage is now shorter than the time it takes most SOCs to investigate a single alert.

The alert that does not exist

Behind the speed problem there is something more structural. Security platforms are designed to detect behavioral anomalies, things that look like malicious activity based on what is happening and not what is driving it. What they cannot say is whether an attack was initiated by a human or by an AI agent operating autonomously. Currently, no platform highlights that distinction.

The vulnerability discovered in Claude Code after the leak illustrated this directly: a malicious file can instruct the AI ​​to generate a command pipeline that looks exactly like a legitimate build process, triggering behavior that completely bypasses the permissions system without raising a flag that would appear in a conventional SIEM.

AI agents can be manipulated through tool descriptions and prompts in ways that bypass traditional access controls without ever causing an authentication failure or raising an alert in your SIEM.“says Burke.”That means detection needs to start tracking what the agent understood they were doing and why they made that decision, rather than flagging policy violations after the fact.

The references to Claude Mythos in the leaked files add a layer to this that hasn’t received much attention. What was exposed was not only the current tool, but also the architectural direction of where agent AI is headed, including Improved reasoning capabilities and deeper integration using native tools. Security teams are building defenses against what these systems can do today. The leaked roadmap describes something considerably more capable.

At this moment the vast majority of platforms cannot make that distinction between AI and human origin,“Burke says:”and security teams are essentially defending blindly against an entire category of threats that they have no visibility into.

The Anthropic leak was a misconfigured debug file. Organizations now trying to determine whether their security infrastructure can detect what an AI agent believed it was authorized to do are working on an issue that existed before March 31 and will exist long after DMCA notices are processed.

There is still no clean end to that problem.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *