The Anthropic browser agent was hijacked 31.5% of the time before security measures were activated.

Across the border labs, the highest rapid injection numbers released this spring are from Anthropic. Target a member of the red team at its newest model in a browser and the attacker hijacked it 31.5% of the time before safeguards are applied. OpenAI, Google, and Meta never gave security leaders a comparable number to put on their side. That figure seems like a liability. In this comparison, it is quite the opposite. It is the only solid piece of land.

Four border laboratories each sent information about the immediate injection and there were no two that coincided. anthropic outpost 244 pages and four agent surfaces on the table on May 28. OpenAI reported a surface, connectors. Google took out the model card issue and put it in a separate security framework. Meta sent without card closed model at all. The table below for immediate injection disclosure between providers shows what each lab tested, what each measured, and the four places where a side-by-side comparison falls apart.

A fast injection hides a malicious instruction in something an agent reads, a web page, a document, or the output of a tool. A planted line can leak records or trigger actions that no one approved, and these cards are the buyer’s only first-hand evidence.

There is no industry standard to measure any of this, and that is the root of the problem. Carter Rees, vice president of AI at Reputationtold VentureBeat that rapid injection breaks the assumption behind each legacy tool. "A phrase as innocuous as “ignore previous instructions” can carry a payload as devastating as a buffer overflow, but it has nothing in common with known malware signatures." Without a shared signature to search for, each lab created its own criteria and the results don’t match.

Adam Meyers, senior vice president of adversary operations at Strike crowdHe said the exposure is now the buyer’s responsibility. "As you deploy AI, you increase your attack surface, so now you need to be able to protect those AI models from adversary misuse or data poisoning or rapid injection." CrowdStrike’s own top-line data shows that the threat side is not standing still. in your 2026 Financial Services Threat Landscape Reportpublished in May, the company reported that adversaries use AI to compress the time from initial access to impact faster than legacy defenses can respond.

Anthropo measured four surfaces. The numbers vary by an order of magnitude depending on which one you read.

The Opus 4.8 card does what others don’t: it stops fast surface injection and extension is the story.

Put the model into a coding environment and an adaptive attacker from Gray Swan’s Shade tool succeeded. 7.03% unique attempts thinking about. The safeguards led that to 2.09%.

Moving the same attack class to a browser, the surface behind claude in chrome and Claude Coworkand the ground gives way. Anthropic put the professional red teams 129 web environments suspended from training and printed each result in Table 5.2.2.4.A on page 81 of the system card. Per attempt is the proportion of all injection attempts that were performed in 129 settings with 10 attempts each. Per scenario is the hardest cut, the proportion of environments in which at least one attempt landed.

Read the column per attempt without safeguards, think further and the raw rate drops with each generation, from Sonnet 4.6 at 50.7% to Opus 4.8 at 31.5%. The lowest on the table, 5.9%, belongs to Mythos Preview, which no one can buy yet. If safeguards are activated, Opus 4.8 drops to 0.5%. If you disable thought, it will drop to zero in all 129 environments.

OpenAI measured a surface, with attacks it already knew.

He GPT-5.5 cardpublished on April 23 and updated on April 24, it handles fast injection in one place, a single section on robustness to known attacks against connectors. OpenAI reports this as a robustness score where the higher the better, the opposite of the attack success rate. GPT-5.5 reached 0.963down 0.998 for GPT-5.4 thought. That figure is the whole revelation.

Anthropic tested four surfaces against an adaptive attacker that rewrites its approach based on what the model does, then ran a week-long bug bounty in which red team members attempted to break the model live. When the encoding results were worse than Opus 4.7, the card indicated so.

Put the 0.963 next to 31.5% and they look like they belong on a marker. It’s not like that. One is a robustness score against known attacks on a surface. The other is an attack success rate per attempt across 129 browser environments against an attacker who adapted in real time.

Google and Meta never put the number on the card

from google Gemini 3 The files call for injection under mitigations, and the release materials describe a stronger resistance without any numbers attached. He Frontier Security Framework Report it runs red teams, but in all its capability domains, and rapid injection is not one of them. There is no model card, no frame page, and no number per surface that a buyer can include in a risk review.

Meta sends open weights without a closed model card. The rapid injection defense is in a separate stack, that of Purple Llama. firestop. TO NoticeGuard 2 classifier and an AlignmentCheck auditor, against the public AgentDojo benchmark and its 97 tasks, reduce the success of the attack 17.6% no defense for 1.75% set. Real numbers. They rate security barriers based on a public reference point, not on the model of a deployment surface that a security team would recognize.

Immediate Injection Disclosure Chart Between Providers

The grid below works on whatever border model security teams are weighing. Each row marks a place where the four laboratories are divided. Each division is where a quick comparison breaks down. The Anthropic figures come from the Opus 4.8 system card. Everything related to the other three comes from each vendor’s published security documentation.

Dimension	Anthropo, Opus 4.8	OpenAI, GPT-5.5	Google, Gemini 3.x	Goal, pile of flames
Security document	System card, May 28, 2026, 244 pages.	System card, April 23, 2026, updated April 24	Model card plus a separate Frontier Security Framework report	Without closed model card. Open Dumbbells Plus Purple Llama Stack
Injection benchmark or data set	ART from Gray Swan and UK AISI, Shade tool, plus internal browser evaluation, 129 environments	Evaluation of internal connectors, known attacks.	None for injection	AgentDojo, 97 tasks
Surfaces with injection evaluation.	Four. Use of tools, coding, use of computers, browser.	One. Connectors	None published for injection.	One. AgentDojo agent tasks
Multiple attempt escalation shown	Yes. ART benchmark at 1, 10, 100. Coding and computer use at 1 and 200	No. A single score	No	No
Main Metric and Unit	Attack success rate. Browser, with thought, 31.5% raw, 0.5% protected	Robustness score, the higher the better. 0.963, compared to 0.998 for GPT-5.4 thinking	None published. Higher resistance claimed qualitatively	Attack success rate on AgentDojo. 17.6% reference to 1.75% combined
Live External Reward	Yes. One week live injection reward with external red teams	No injection reward. Only biological reward	None found	None found
Regression revealed	Yes, explicit, with numbers.	The number fell from 0.998 to 0.963, not framed as a regression	Greater resistance is claimed, without figures	Not applicable

Five factors security teams should consider now

Anthropic tested four surfaces and printed each number. OpenAI tested one. Google did not print any surface rates. Meta rated their railings, not the model. The four disclosures do not constitute a comparison. These five steps build one.

Extract all the agents you have deployed or defined and label them based on the surface they touch, browser, code, connectors, or desktop. The Anthropic rate for Work 4.8 runs 2.09% in encoding and 0.5% in the browser. A combined number covers neither. Obtain the supplier’s published rate for your specific acreage. If the vendor never published one, treat it as untested.

Send the cross-vendor grid to each vendor under evaluation. TO Connector score of 0.963 and a navigation rate of 31.5% were never on a scale. Require an attack success rate per surface, raw and protected, with the named attacker methodology. Blank cells are surfaces without first-hand evidence.

Confirm in writing what number your integration gets. Anthropic’s 0.5% comes from Claude in Chrome and Cowork with the full protection stack. In the API, the model is shipped without them. Do not accept a product number for an API implementation.

Add two clauses to the RFP. The vendor tested an adaptive attacker that rewrites payloads into the model, and someone outside the company attempted to break it. Anthropic ran Gray Swan’s adaptive Shade tool and a one-week paid bounty. OpenAI tested known attacks on a surface. Adversaries do not send known payloads.

Run your own injection test before submitting any agent. Provider numbers come from provider environments with indications from the provider system. Your stack has its own prompts, permissions, and data access. Set an approval threshold. Anything above it is not activated.

The final result. There is no standard for this yet. A supplier’s number tells you what they chose to measure. Your own red team tells you what you are exposed to.

Source link

The Anthropic browser agent was hijacked 31.5% of the time before security measures were activated.

Anthropo measured four surfaces. The numbers vary by an order of magnitude depending on which one you read.

OpenAI measured a surface, with attacks it already knew.

Google and Meta never put the number on the card

Immediate Injection Disclosure Chart Between Providers

Five factors security teams should consider now

Leave a ReplyCancel Reply

More airlines expected to follow in Spirit’s footsteps as fuel crisis cuts airline profits in half

Is this the dawn of the Tokenpocalypse?

The UK plans to buy AI chips from British companies to prevent them from leaving for the US.

Anthropo measured four surfaces. The numbers vary by an order of magnitude depending on which one you read.

OpenAI measured a surface, with attacks it already knew.

Google and Meta never put the number on the card

Immediate Injection Disclosure Chart Between Providers

Five factors security teams should consider now

Leave a ReplyCancel Reply

Trending now

More airlines expected to follow in Spirit’s footsteps as fuel crisis cuts airline profits in half

Is this the dawn of the Tokenpocalypse?

The UK plans to buy AI chips from British companies to prevent them from leaving for the US.