Mystery Solved: Anthropic Reveals Changes to Claude’s Harnesses and Operating Instructions Likely Caused Degradation



For several weeks, a growing chorus of AI developers and power users claimed that Anthropic’s flagship models were losing their edge. GitHub, X and Reddit users reported a phenomenon they described as "AI twitch"—a perceived degradation in which Claude seemed less capable of sustained reasoning, more prone to hallucinations, and increasingly wasteful of tokens.

Critics pointed to a measurable change in behavior, claiming that the model had gone from a "research first" approach a lazier, "edit first" style that could no longer be trusted for complex engineering.

Although the company initially rejected the claims of "nerf" In the model for managing demand, growing evidence from high-profile users and third-party benchmarks created a significant trust gap.

Today, Anthropic addressed these concerns directlypublishing a technical post-mortem that identified three separate product layer changes responsible for the reported quality issues.

"We take reports of degradation very seriously," read Anthropic blog post on the topic.. "We never intentionally downgraded our models and were able to immediately confirm that our API and inference layer were not affected."

Anthropic claims to have resolved the issues by reverting the change to the reasoning effort and detail message, while also fixing the caching bug in version v2.1.116.

Growing evidence of degradation

The controversy gained momentum in early April 2026, fueled by detailed technical analysis from the developer community. Stella Laurenzo, senior director of AMD’s AI group, published a comprehensive audit of 6,852 Claude Code session files and over 234,000 tool calls on Github showing that performance is dropping due to its previous usage.

Their findings suggested that Claude’s depth of reasoning had decreased dramatically, leading to reasoning loops and a tendency to choose the "simplest solution" instead of the correct one.

This anecdotal frustration was apparently validated by third-party benchmarks. BridgeMind reported that the Claude Opus 4.6’s accuracy had fallen from 83.3% to 68.3% in its testing, causing its ranking to drop from 2nd to 10th.

Although some researchers argued that these specific baseline comparisons were flawed due to inconsistent testing scopes, the narrative that Claude had become "dumber" It became a viral topic of conversation. Users also reported that usage limits were running out faster than expected, raising suspicions that Anthropic was intentionally limiting performance to manage growing demand.

the causes

In its post-morem bog post, Anthropic clarified that while the model’s underlying weights had not regressed, three specific changes to the "leverage" Surrounding the models had inadvertently hindered their performance:

  • Default reasoning effort: On March 4, Anthropic changed the default reasoning effort of high to medium for Claude Code to fix UI latency issues. This change was intended to prevent the interface from appearing. "frozen" while the model was thinking, but it resulted in a noticeable drop in intelligence for complex tasks.

  • A caching logic error: Released on March 26, a caching optimization aimed at removing old "thought" of inactive sessions contained a critical error. Instead of clearing the thought history once after an hour of inactivity, it cleared it on each subsequent turn, causing the model to lose its "short term memory" and become repetitive or forgetful.

  • Detail limits of system indications: On April 16, Anthropic added instructions to the system message to keep text between tool calls to less than 25 words and final responses to less than 100 words. This attempt to reduce verbosity in Opus 4.7 failed, causing a 3% drop in coding quality evaluations.

Impact and future safeguards

The quality issues extended beyond the Claude Code CLI and affected the Agent Claude SDK and Claude Coworkalthough he Claude API was not hit.

Anthropic admitted that these changes made the model appear to have "less intelligence," who recognized that it was not the experience that users should expect.

To regain user trust and prevent future regressions, Anthropic is implementing several operational changes:

  • Internal internal test: More internal staff will be required to use the exact public versions of Claude Code to ensure they experience the product as users do.

  • Enhanced Assessment Suites: The company will now run a broader set of evaluations by model and "ablations" for each system change to isolate the impact of specific instructions.

  • Stricter controls: New tools have been created to facilitate auditing of rapid changes, and model-specific changes will be strictly aligned to the intended objectives.

  • Subscriber Compensation: To account for token waste and performance friction caused by these errors, Anthropic reset usage limits for all subscribers effective April 23.

The company intends to use its new @ClaudeDevs account on X and GitHub threads to provide deeper reasoning behind future product decisions and maintain a more transparent dialogue with your developer base.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *