Anthropic’s Claude Opus 4.8 is four times more honest, Mythos follows



TL;DR

Anthropic has released Claude Opus 4.8, an update to its flagship AI model that is four times less likely to let code flaws go unnoticed. The company also teased Mythos-class models, which have already found more than 10,000 critical software vulnerabilities through Project Glasswing, and announced a $65 billion Series H round at a $965 billion post-money valuation.

Anthropic has launched Close Job 4.8an update to its flagship AI model that the company says is more honest, more reliable in agency tasks and better at detecting its own mistakes. The model is available immediately at the same price as its predecessor, $5 per million input tokens and $25 per million output tokens, and is available Implementation in all Anthropic products. including claude.ai, Claude Code and the API.

The main improvement is honesty. Anthropic says that Opus 4.8 is about four times less likely than Opus 4.7 to let flaws in the code you’ve written go unnoticed. Early testers report that the model is more willing to point out uncertainties about its work and less likely to make unsubstantiated claims, a persistent problem in all AI models that tend to project confidence regardless of whether it is justified or not.

Benchmark gains across the board

Opus 4.8 improves on its predecessor in all benchmarks published by Anthropic. In agent coding (Terminal-Bench 2.1), the score increases from 64.3% to 69.2%. Multidisciplinary reasoning with tools improves from 54.7% to 57.9%. Agent use of the computer increases from 82.8% to 83.4%, and knowledge work scores increase from 1,753 to 1,890.

Anthropic’s alignment assessment found that Opus 4.8 reaches new highs on measures of prosocial traits, including supporting user autonomy and acting in the user’s best interest. Rates of misaligned behavior, such as deception or misuse cooperation, are substantially lower than in Opus 4.7 and comparable to Claude Mythos Preview, Anthropic’s best-aligned model.

Early testers see practical benefits

The launch is accompanied by the support of companies that already use the model. Cognition, the company behind AI coding agent Devinsaid that Opus 4.8 uses tools cleanly and fixes issues with comment verbosity and tool calls that appeared in Opus 4.7. Cursor, the AI-powered code editor, reported improvements across all levels of effort in its CursorBench evaluation.

Harvey, which builds artificial intelligence for legal work, said Opus 4.8 delivers the highest score on record in its Legal Agent Benchmark and is the first model to break 10% overall on the overall pass standard. Databricks reported that Opus 4.8 handles deeper, multi-step questions faster in its Genie AI agent, costing a nominal 61% cheaper than Opus 4.7.

Thomson Reuters said CoCounsel Legal saw significant improvements in consistency and quality of reasoning. Hebbia, which creates artificial intelligence for financial document analysis, noticed better citation accuracy and greater token efficiency in retrieval tasks.

New features next to the model.

Anthropic is releasing several features alongside Opus 4.8. A new effort control in claude.ai and Cowork allows users to choose how much calculation Claude applies to a response, trading speed with quality. Claude Code gains a dynamic workflows feature that allows you to schedule work and run hundreds of parallel subagents in a single session, enabling codebase-scale migrations across hundreds of thousands of lines of code.

For developers, the Messages API now accepts system inputs within the messages array, allowing instructions to be updated mid-task without breaking the message cache. Fast mode for Opus 4.8, which runs at 2.5 times the speed, is now three times cheaper than previous models.

Myths is the greatest story

The most significant announcement may be the one that follows. Anthropic said it plans to launch a new class of model with higher intelligence than Opus, based on the Claude Mythos architecture. A small number of organizations are already using Claude Mythos Preview through Glass Wing Projectan initiative focused on using the model for cybersecurity work. Anthropic and approximately 50 partners, including Apple, Google, Microsoft, and Amazon Web Services, have used Mythos Preview to find more than 10,000 high or critical severity vulnerabilities in critical software infrastructure.

The Mythos-class models require stricter cyber safeguards before general launch, Anthropic said, but the company hopes to bring them to all customers in the coming weeks. The model is at a full capability level above Opus 4.7 and can autonomously find zero-day vulnerabilities and create exploits for them, which explains both the enthusiasm and caution around its implementation.

A company that is approaching a trillion dollars

The release of Opus 4.8 comes as Anthropic’s valuation continues to rise. The company announced a $65 billion Series H round at a $965 billion post-money valuation on the same day, up from the $380 billion valuation at which closed its $30 billion Series G in February. Revenue has grown from approximately $1 billion at the end of 2024 to an estimated annualized run rate of $30 billion in 2026, driven by enterprise adoption of Claude.

anthropic too opened a new office in Milan on May 28, its sixth in Europe, and appointed KiYoung Choi as Korea Representative Director ahead of the opening of an office in Seoul. The expansion reflects growing demand for Claude in business markets outside the United States.

The competitive context

Opus 4.8 enters a market where the pace of model launches has accelerated dramatically. OpenAI released GPT-5.5 as its first fully retrained base model since GPT-4.5, and GPT-5.4 set new records in professional benchmarks earlier this year. Google has invested up to $40 billion in Anthropic but continues to develop its own Gemini models. The frontier AI market has consolidated into a three-way race between Anthropic, OpenAI and Google, with each company releasing incremental model updates at an increasing pace.

For Anthropic, the distinction it is trying to make with the Opus 4.8 is not raw capacity but reliability. A model that detects its own errors, flags its uncertainties, and consistently follows instructions is most useful in agent workflows where AI systems operate with limited human supervision. Whether that positioning holds when the Mythos-class models arrive, which promise greater intelligence with new security limitations, will determine whether Anthropic can maintain its leadership in the enterprise market it has worked to dominate.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *