Alibaba-owned Qwen3.7-Max can run for 35 hours autonomously and supports external harnesses such as Anthropic’s Claude Code.



The AI ​​industry has fully entered the "It was from the agent," a paradigm in which AI models do much more than generate text: they now actively plan, execute and correct complex tasks in days instead of seconds.

So it’s perhaps not surprising to see Chinese e-commerce giant Alibaba’s famed Qwen team of AI researchers release a model capable of performing autonomous AI agent work for several days – that model has arrived in the form of Qwen3.7-Max, which company reports in a blog post accomplished "~35 hours of continuous autonomous execution" – albeit in a proprietary format, not open source, as previous Qwen Team releases were.

This is also to be expected: it is what many analysts and industry experts feared in recent years. following the departure of several key Team Qwen leaders earlier this year. But it makes sense financially for Alibaba, at least in the short term: training AI models, especially those as powerful as Qwen3.7-Max, is expensive, and essentially giving them away for free, as open source models are, doesn’t immediately help recoup any costs.

In that sense, Alibaba is simply aligning its efforts with American AI giants like OpenAI and Google by offering the latest and greatest models only through paid APIs and subscription packages or paid web plans, and slightly less efficient models through open source.

Still, the arrival of Qwen3.7-Max offers more options for businesses and individual users, and more competition for American AI labs, which is rarely a bad thing for consumers of all budget levels. However, the fact that the model can only be accessed from endpoints based in China means that its appeal may be limited for US and European companies looking to maximize compliance and security posture when fulfilling government contracts, or even simply trying to comply with all relevant state, local and national regulations on data sovereignty.

The marathon era of AI

To understand why Qwen3.7-Max deviates from previous models, you have to look at how it was trained and how it works in practice.

Language models often degrade when they are forced to maintain a single line of thought for thousands of conversational turns; They forget instructions, hallucinate variables, or simply get stuck in logic loops. Qwen3.7-Max was specifically designed as a "versatile agent base" capable of "long term reasoning" to overcome exactly this bottleneck.

The clearest demonstration of this capability is an autonomous engineering task detailed by the Qwen team. The model was given access to an isolated server equipped with a T-Head ZW-M890 PPU, a hardware architecture the model had never encountered during its training. Their task was to optimize an attention core.

Over the course of 35 hours straight, Qwen3.7-Max operated completely autonomously. It executed 1,158 separate tool calls, performed 432 kernel evaluations, diagnosed compilation failures, and iteratively improved the code to achieve a 10.0x geometric mean speedup.

In comparison, Chinese competitor models such as GLM-5.1 by z.ai and Kimi K2.6 by Moonshot with a speedup limit of 7.3x and 5.0x respectively, and often voluntarily ended their sessions when they failed to make progress. However, both are available open source.

This resistance is achieved through what Alibaba calls "environment scaling". Just as early LLMs became smarter by ingesting more diverse texts, Qwen3.7-Max was trained on a wide scaled range of dynamic agent environments.

It is capable of simulating a one-year life cycle of a startup in the "YC Bank" assessment, navigating hundreds of decision-making rounds spanning personnel management and contract selection. In this simulation, the model managed to generate $2.08 million in virtual revenue, almost doubling the performance of the previous generation, Qwen3.6-Plus.

Additionally, the model has built-in reward hacking self-monitoring, which autonomously detects when it attempts to cheat a training environment and adds heuristic rules to correct its own behavior.

A brain for any scaffold

From a product perspective, Qwen3.7-Max is designed to be the cognitive engine for modern software development and business automation.

The model offers a huge context window of 1 million tokens and a maximum output limit of 64K, providing immense overhead for processing large code bases or long technical documents.

One of its most compelling features is "cross generalization". Rather than being coded to work best within a specific proprietary interface, Qwen3.7-Max is designed to act as a direct intelligence layer for various agent frameworks. He supports the Anthropic API protocol natively, allowing developers Plug it directly into existing tools like Claude Code or OpenClaw.

Benchmark data provided by Alibaba indicates that this widespread approach has paid huge dividends.

In the Apex Mathematical Reasoning BenchmarkQwen3.7-Max earned a score of 44.5, eclipsing Claude Opus-4.6 Max’s score of 34.5 and DeepSeek V4-Pro Max 38.3. He also published dominant scores on the latest Humanity Exam (41.4) and the MCP-Atlas Realistic Coding Agent (76.4).

This translates into tangible utility for end users. Through open source Model Context Protocol (MCP) integrations, the model can operate as a stand-alone office assistant, capable of reading the university’s formatting specifications and automatically reformatting a messy Word document via command-line tools without human intervention.

Managing this level of intelligence has a different cost. Developers accessing the API through Alibaba Cloud Model Studio will pay $2.50 per million input tokens and $7.50 per million output tokens. The platform also includes explicit cache creation and read pricing, as well as a $10 per 1,000 call fee for built-in web searches, although the code interpretation tools remain free for a limited time.

Qwen3.7-Max occupies a strategic middle ground in today’s API economy. While it commands a notable premium over its aggressively priced domestic rivals (which cost nearly double the DeepSeek V4 Pro ($5.22) and Z.ai’s GLM-5.1 ($5.80), it dramatically undercuts the western frontier giants it routinely matches in benchmarks.

For context, running heavy agent workflows through OpenAI’s GPT-5.4 or Anthropic’s Claude Opus 4.7 will cost developers $17.50 and $30.00 per million tokens, respectively. Check out VentureBeat’s price chart below:

VentureBeat Frontier AI Model API Pricing Overview

Model

Input

Production

Total cost

Fountain

Flash MiMo-V2.5

$0.10

$0.30

$0.40

Xiaomi MiMo

Minimax M2.7

$0.30

$1.20

$1.50

minimax

Gemini 3.1 Flash-Lite

$0.25

$1.50

$1.75

Google

MiMo V2.5

$0.40

$2.00

$2.40

Xiaomi MiMo

Kimi-K2.6

$0.95

$4.00

$4.95

Moonshot/Kimi

GLM-5

$1.00

$3.20

$4.20

Z.ai

Grok 4.3 (under context)

$1.25

$2.50

$3.75

xAI

DeepSeek V4 Pro

$1.74

$3.48

$5.22

deep search

GLM-5.1

$1.40

$4.40

$5.80

Z.ai

Claude Haiku 4.5

$1.00

$5.00

$6.00

anthropic

Grok 4.3 (high context)

$2.50

$5.00

$7.50

xAI

Qwen3.7-Max

$2.50

$7.50

$10.00

Alibaba Cloud

Gemini 3.5 Flash

$1.50

$9.00

$10.50

Google

Gemini 3.1 Pro Preview (≤200K)

$2.00

$12.00

$14.00

Google

GPT-5.4

$2.50

$15.00

$17.50

Open AI

Gemini 3.1 Pro Preview (>200K)

$4.00

$18.00

$22.00

Google

Close Job 4.7

$5.00

$25.00

$30.00

anthropic

GPT-5.5

$5.00

$30.00

$35.00

Open AI

By positioning the Qwen3.7-Max just below Google’s Gemini 3.5 Flash ($10.50) but well above the budget-level models, Alibaba is signaling that this is not a commercial launch; is a flagship reasoning engine priced to move enterprise workloads away from Silicon Valley’s more expensive offerings.

The license remains proprietary for now

For all its technical brilliance, the most controversial aspect of Qwen3.7-Max is how it is distributed. Qwen is billing the launch as a "proprietary model". It’s strictly API only.

Historically, Alibaba’s Qwen has been a hero for open source and local LLM communities. Previous iterations, such as Qwen 2.5 and Qwen 3.6, released their weights. Open weights allow developers, researchers, and companies to download the model, run it on their own hardware, and tune it for very specific or data-sensitive use cases without sending proprietary information to a third-party server.

By locking Qwen3.7-Max behind an API, Alibaba is falling back on the standard business playbook used by OpenAI (with GPT-4) and Anthropic (with Claude). For enterprise users, this means that using Qwen3.7-Max requires trusting Alibaba Cloud with your data flows and completely relying on Internet connectivity to run your agent workflows. For the open source community, it means losing access to what is currently one of the most capable models on the planet.

The community’s reactions are divided between astonishment and disappointment.

Reaction from the developer community has been swift, characterized by a mix of deep respect for the engineering achievements and frustration with the licensing model.

Prominent TO the Sudo commenter on (@sudoingX) captured the prevailing sentiment on X (formerly Twitter). "qwen is unreal," they wrote. "They just dropped 3.7 max and it’s outperforming the opus 4.6 max on most of the benchmarks they ran.".

The technical metrics, particularly model endurance, have stunned many in the field. "the maximum mathematical number, 44.5 versus opus 34.5, that is not a small gap," Sudo su pointed out. "The 35 straight hours on a kernel optimization task with 1000+ tool calls is the part I keep rereading. That’s what really happens in the age of agents, not a slide.".

The speed of Alibaba’s iteration is also attracting attention. With Qwen 3.6 released last month, the jump to 3.7-Max highlights a relentless development cadence. As Sudo su observed, "no one else moves like that".

However, the praise is heavily undermined by the shift to a closed ecosystem. The deweighting of the models is seen as a blow to the localized AI movement, which relies on next-generation open models to push the boundaries of what can be done on consumer hardware or in private enterprise clusters.

"However, it’s one thing to open the source code for this as well." Sudo su pleaded in his post. "3.6 density improved the entire local film ecosystem. the maximum level moving to API would only close a door that we have been keeping open. give us the pesos eventually".

Qwen3.7-Max demonstrates that the era of the autonomous agent is no longer a theoretical projection; It is a present reality capable of executing complex engineering feats while humans sleep. The only question now is whether this new frontier of AI will be a democratized resource that you can download to your laptop or an intelligence utility rented strictly from the cloud. For now, with Qwen3.7-Max, it is undeniably the latter.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *