
Across the border labs, the highest rapid injection numbers released this spring are from Anthropic. Target a member of the red team at its newest model in a browser and the attacker hijacked it 31.5% of the time before safeguards are applied. OpenAI, Google, and Meta never gave security leaders a comparable number to put on their side. That figure seems like a liability. In this comparison, it is quite the opposite. It is the only solid piece of land.
Four border laboratories each sent information about the immediate injection and there were no two that coincided. anthropic outpost 244 pages and four agent surfaces on the table on May 28. OpenAI reported a surface, connectors. Google took out the model card issue and put it in a separate security framework. Meta sent without card closed model at all. The table below for immediate injection disclosure between providers shows what each lab tested, what each measured, and the four places where a side-by-side comparison falls apart.
A fast injection hides a malicious instruction in something an agent reads, a web page, a document, or the output of a tool. A planted line can leak records or trigger actions that no one approved, and these cards are the buyer’s only first-hand evidence.
There is no industry standard to measure any of this, and that is the root of the problem. Carter Rees, vice president of AI at Reputationtold VentureBeat that rapid injection breaks the assumption behind each legacy tool. "A phrase as innocuous as “ignore previous instructions” can carry a payload as devastating as a buffer overflow, but it has nothing in common with known malware signatures." Without a shared signature to search for, each lab created its own criteria and the results don’t match.
Adam Meyers, senior vice president of adversary operations at Strike crowdHe said the exposure is now the buyer’s responsibility. "As you deploy AI, you increase your attack surface, so now you need to be able to protect those AI models from adversary misuse or data poisoning or rapid injection." CrowdStrike’s own top-line data shows that the threat side is not standing still. in your 2026 Financial Services Threat Landscape Reportpublished in May, the company reported that adversaries use AI to compress the time from initial access to impact faster than legacy defenses can respond.
Anthropo measured four surfaces. The numbers vary by an order of magnitude depending on which one you read.
The Opus 4.8 card does what others don’t: it stops fast surface injection and extension is the story.
Put the model into a coding environment and an adaptive attacker from Gray Swan’s Shade tool succeeded. 7.03% unique attempts thinking about. The safeguards led that to 2.09%.
Moving the same attack class to a browser, the surface behind claude in chrome and Claude Coworkand the ground gives way. Anthropic put the professional red teams 129 web environments suspended from training and printed each result in Table 5.2.2.4.A on page 81 of the system card. Per attempt is the proportion of all injection attempts that were performed in 129 settings with 10 attempts each. Per scenario is the hardest cut, the proportion of environments in which at least one attempt landed.
Read the column per attempt without safeguards, think further and the raw rate drops with each generation, from Sonnet 4.6 at 50.7% to Opus 4.8 at 31.5%. The lowest on the table, 5.9%, belongs to Mythos Preview, which no one can buy yet. If safeguards are activated, Opus 4.8 drops to 0.5%. If you disable thought, it will drop to zero in all 129 environments.
OpenAI measured a surface, with attacks it already knew.
He GPT-5.5 cardpublished on April 23 and updated on April 24, it handles fast injection in one place, a single section on robustness to known attacks against connectors. OpenAI reports this as a robustness score where the higher the better, the opposite of the attack success rate. GPT-5.5 reached 0.963down 0.998 for GPT-5.4 thought. That figure is the whole revelation.
Anthropic tested four surfaces against an adaptive attacker that rewrites its approach based on what the model does, then ran a week-long bug bounty in which red team members attempted to break the model live. When the encoding results were worse than Opus 4.7, the card indicated so.
Put the 0.963 next to 31.5% and they look like they belong on a marker. It’s not like that. One is a robustness score against known attacks on a surface. The other is an attack success rate per attempt across 129 browser environments against an attacker who adapted in real time.
Google and Meta never put the number on the card
from google Gemini 3 The files call for injection under mitigations, and the release materials describe a stronger resistance without any numbers attached. He Frontier Security Framework Report it runs red teams, but in all its capability domains, and rapid injection is not one of them. There is no model card, no frame page, and no number per surface that a buyer can include in a risk review.
Meta sends open weights without a closed model card. The rapid injection defense is in a separate stack, that of Purple Llama. firestop. TO NoticeGuard 2 classifier and an AlignmentCheck auditor, against the public AgentDojo benchmark and its 97 tasks, reduce the success of the attack 17.6% no defense for 1.75% set. Real numbers. They rate security barriers based on a public reference point, not on the model of a deployment surface that a security team would recognize.
Immediate Injection Disclosure Chart Between Providers
The grid below works on whatever border model security teams are weighing. Each row marks a place where the four laboratories are divided. Each division is where a quick comparison breaks down. The Anthropic figures come from the Opus 4.8 system card. Everything related to the other three comes from each vendor’s published security documentation.
|
Dimension |
Anthropo, Opus 4.8 |
OpenAI, GPT-5.5 |
Google, Gemini 3.x |
Goal, pile of flames |
|
Security document |
System card, May 28, 2026, 244 pages. |
System card, April 23, 2026, updated April 24 |
Model card plus a separate Frontier Security Framework report |
Without closed model card. Open Dumbbells Plus Purple Llama Stack |
|
Injection benchmark or data set |
ART from Gray Swan and UK AISI, Shade tool, plus internal browser evaluation, 129 environments |
Evaluation of internal connectors, known attacks. |
None for injection |
AgentDojo, 97 tasks |
|
Surfaces with injection evaluation. |
Four. Use of tools, coding, use of computers, browser. |
One. Connectors |
None published for injection. |
One. AgentDojo agent tasks |
|
Multiple attempt escalation shown |
Yes. ART benchmark at 1, 10, 100. Coding and computer use at 1 and 200 |
No. A single score |
No |
No |
|
Main Metric and Unit |
Attack success rate. Browser, with thought, 31.5% raw, 0.5% protected |
Robustness score, the higher the better. 0.963, compared to 0.998 for GPT-5.4 thinking |
None published. Higher resistance claimed qualitatively |
Attack success rate on AgentDojo. 17.6% reference to 1.75% combined |
|
Live External Reward |
Yes. One week live injection reward with external red teams |
No injection reward. Only biological reward |
None found |
None found |
|
Regression revealed |
Yes, explicit, with numbers. |
The number fell from 0.998 to 0.963, not framed as a regression |
Greater resistance is claimed, without figures |
Not applicable |
Five factors security teams should consider now
Anthropic tested four surfaces and printed each number. OpenAI tested one. Google did not print any surface rates. Meta rated their railings, not the model. The four disclosures do not constitute a comparison. These five steps build one.
Extract all the agents you have deployed or defined and label them based on the surface they touch, browser, code, connectors, or desktop. The Anthropic rate for Work 4.8 runs 2.09% in encoding and 0.5% in the browser. A combined number covers neither. Obtain the supplier’s published rate for your specific acreage. If the vendor never published one, treat it as untested.
Send the cross-vendor grid to each vendor under evaluation. TO Connector score of 0.963 and a navigation rate of 31.5% were never on a scale. Require an attack success rate per surface, raw and protected, with the named attacker methodology. Blank cells are surfaces without first-hand evidence.
Confirm in writing what number your integration gets. Anthropic’s 0.5% comes from Claude in Chrome and Cowork with the full protection stack. In the API, the model is shipped without them. Do not accept a product number for an API implementation.
Add two clauses to the RFP. The vendor tested an adaptive attacker that rewrites payloads into the model, and someone outside the company attempted to break it. Anthropic ran Gray Swan’s adaptive Shade tool and a one-week paid bounty. OpenAI tested known attacks on a surface. Adversaries do not send known payloads.
Run your own injection test before submitting any agent. Provider numbers come from provider environments with indications from the provider system. Your stack has its own prompts, permissions, and data access. Set an approval threshold. Anything above it is not activated.
The final result. There is no standard for this yet. A supplier’s number tells you what they chose to measure. Your own red team tells you what you are exposed to.





