Cursor's new coding model, Composer 2, is here: it beats Claude Opus 4.6 but still falls behind GPT-5.4

Cursor, a San Francisco AI coding platform from startup Anysphere valued at $29.3 billionhas launched Composer 2a new internal coding model now available within your AI agent coding environment that is a refined variant of Chinese open source model Kimi K2.5and offers dramatically improved benchmarks over its previous internal model.

He is also launching and doing Quick Composer 2a more expensive but faster variant, the default experience for users.

Here is the cost breakdown:

Composer Standard 2: $0.50/$2.50 for 1 million input/output tokens
Composer 2 Quick: at $1.50/$7.50 for 1 million input/output tokens

That’s a big drop from the Cursor’s predecessor internal model, Composer 1.5, Februarywhich costs $3.50 per million tokens in and $17.50 per million tokens out; Composer 2 is about 86% cheaper in both aspects.

Composer 2 Fast is also approximately 57% cheaper than Composer 1.5.

There are also discounts for "cache read prices," that is, sending some of the same tokens in a message to the model again, $0.20 per million tokens for Composer 2 and $0.35 per million for Composer 2 Fast, versus $0.35 per million for Composer 1.5.

It also matters that this appears to be a native version of Cursor, not a widely distributed standalone model. In the company’s announcement and model documentation, Composer 2 is described as available in Cursor, tuned for Cursor’s agent workflow, and integrated with the product’s tools stack.

The materials provided do not indicate separate availability through external model platforms or as a general-purpose API outside of the Cursor environment.

Cursor is launching long-term coding, not just better endings

The most profound technical claim of this release is not simply that Composer 2 scores higher than Composer 1.5. It’s just that Cursor says the model is better suited to long-horizon agent coding.

On its blog, Cursor says the quality improvements come from its first continuous pre-training run, which gave it a stronger foundation for scaled reinforcement learning. From there, the company says it trained Composer 2 on long-term coding tasks and that the model can solve problems that require hundreds of actions.

That framework is important because it addresses one of the biggest unsolved problems in AI coding. Many models are good for isolated code generation. Far fewer remain reliable in a longer workflow that includes reading a repository, deciding what to change, editing multiple files, executing commands, interpreting failures, and continuing toward a goal.

The Cursor documentation reinforces that this is the use case you are interested in. It describes Composer 2 as an agent model with a 200,000-token context window, optimized for tool usage, file edits, and terminal operations within Cursor.

Also take note of training techniques such as self-summarization for long-duration tasks. For developers already using Cursor as their primary environment, that tighter setting may be more important than a generic leaderboard statement.

Benchmark Gains Substantial Even as GPT-5.4 Still Leads a Key Chart

Cursor’s published results show a clear improvement over previous Composer models. The company lists Composer 2 at 61.3 on CursorBench, 61.7 on Terminal-Bench 2.0, and 73.7 on SWE-bench Multilingual.

That compares to Composer 1.5 at 44.2, 47.9, and 65.9, and Composer 1 at 38.0, 40.0, and 56.9.

The launch is more measured than some model launches because Cursor does not claim universal leadership.

In Terminal-Bench 2.0, which measures how well an AI agent performs tasks in command-line terminal-style interfaces, GPT-5.4 still leads with 75.1, while Composer 2 scores 61.7, ahead of Opus 4.6 with 58.0, Opus 4.5 with 52.1, and Composer 1.5 with 47.9.

That makes Cursor’s pitch more pragmatic and possibly more useful to buyers. The company does not say that Composer 2 is the best model in everything. It is saying that the model has moved to a more competitive level of quality while offering more attractive economics and greater integration with product developers already using.

Cursor also included a performance versus cost chart in its CursorBench benchmarking suite that seems designed to make a Pareto-style argument for Composer 2.

On that chart, Composer 2 is at a stronger cost-performance point than Composer 1.5 and compares favorably to the higher-cost GPT-5.4 and Opus 4.6 configurations shown by Cursor. The company’s message is not simply that Composer 2 scores higher than its predecessor, but that it can offer a more efficient trade-off between cost and intelligence for the daily coding work within Cursor.

Why the “locked on cursor” point is important to buyers

For readers deciding whether to use Composer 2, the most important question may not just be comparative performance. It may be if they want a model optimized for the Cursor product experience itself.

That can be a strength. According to the documentation, Composer 2 can access the Cursor agent tool stack, including semantic code searching, file and folder searching, file reads, file edits, shell commands, browser control, and web access.

That kind of integration can be more valuable than raw model quality if the goal is to complete real software tasks rather than produce impressive answers in one go.

But it also reduces the target audience. Teams looking for a model they can deploy widely across multiple external tools and platforms should recognize that Cursor presents Composer 2 as a model for Cursor users, not as a standalone, generally available base model.

The Bigger Picture: The Cursor Is Making an Operational Argument

The importance of Composer 2 is not that Cursor has suddenly taken first place in all coding benchmarks. It’s not like that. The most important point is that Cursor is making an operational argument: its model is improving, its price is low enough to encourage broader use, and its fastest tier is responsive enough that the company will feel comfortable making it the default despite the higher cost.

That combination could resonate with engineering teams that increasingly care less about the prestige of the abstract model and more about whether a wizard can remain useful during long coding sessions without becoming prohibitively expensive.

The cursor is wider. pricing structure helps frame the competitive pressure around this launch. On their current pricing page, Cursor offers a free Hobby tier, a Pro plan at $20 per month, Pro+ at $60 per monthand Ultra at $200 per month for individual users, with higher tiers offering more usage across OpenAI, Anthropic, and Google models.

On the business side, Teams costs $40 per user per monthwhile Enterprise is custom priced and adds pooled usage, centralized billing, usage analytics, privacy controls, SSO, audit logs, and granular administrative controls. In other words, Cursor doesn’t just charge for access to an encoding model. You’re charging for a managed application layer that sits on top of multiple model providers while adding team features, governance, and workflow tools.

That model is increasingly under pressure as first-party AI companies move deeper into coding. OpenAI and Anthropic are no longer limited to selling models through third-party products; They are also shipping their own coding interfaces, agents, and evaluation frameworks, such as Codex and Claude Code, which raises the question of how much room there is left for an intermediary platform.

Commenters on

Some of those posts describe frustration with Cursor’s pricing, loss of context, or editor-centric experience, while praising Claude Code as a more direct and fully agentic way of working. Even treated with caution, that kind of social chatter points to the strategic problem Cursor faces: It has to demonstrate that its integrated platform, team controls, and now its own internal models add enough value to justify sitting between developers and model makers’ increasingly capable coding products.

That makes Composer 2 strategically important for Cursor.

By offering a much cheaper internal model than Composer 1.5, fitting it closely to Cursor’s own tools stack, and making a faster version the default, the company is trying to show that it provides more than a wrapper for external systems.

The challenge is that as first-party coding products improve, developers and enterprise buyers may increasingly wonder whether they want a separate AI coding platform or whether model makers’ own tools are becoming sufficient on their own.

Source link

Cursor’s new coding model, Composer 2, is here: it beats Claude Opus 4.6 but still falls behind GPT-5.4

Cursor is launching long-term coding, not just better endings

Benchmark Gains Substantial Even as GPT-5.4 Still Leads a Key Chart

Why the “locked on cursor” point is important to buyers

The Bigger Picture: The Cursor Is Making an Operational Argument

Leave a ReplyCancel Reply

MacBook Neo may be one of Apple’s most inspiring products in quite some time

Google TV projectors are becoming more social and I’m not against it

The Galaxy S26 now supports a useful feature that Pixel phones got years ago

Cursor is launching long-term coding, not just better endings

Benchmark Gains Substantial Even as GPT-5.4 Still Leads a Key Chart

Why the “locked on cursor” point is important to buyers

The Bigger Picture: The Cursor Is Making an Operational Argument

Leave a ReplyCancel Reply

Trending now

MacBook Neo may be one of Apple’s most inspiring products in quite some time

Google TV projectors are becoming more social and I’m not against it

The Galaxy S26 now supports a useful feature that Pixel phones got years ago