Today, Chinese AI startup Z.ai (formerly Zhipu AI) announced the immediate launch of GLM-5.2a 753 billion parameter open weights large language model (LLM) designed specifically to master "long horizon" Autonomous coding and engineering tasks.

Available immediately at hugging facehe Z.ai APIand 20+ third-party coding environments, the model features a highly stable context window of 1 million tokens along with enterprise subscription tiers starting at just $12.60 per month.

In excellent news for companies concerned about cost and safety, z.ai has released the GLM-5.2 centerweights under an unrestricted license. MIT Open Source Licenseallowing businesses to download the model freely from Hugging Face, customize or tweak it to their liking, and potentially run it locally or through virtual machines just for the cost of their computing and electricity.

This is an increasingly attractive option for companies as next-generation US proprietary models face an uncertain and potentially disrupted regulatory future, following the The Trump administration’s export control directive from last week prohibits foreign nationals from using Anthropic’s new Claude Fable 5 model. (to which that company responded by completely disconnecting the models in question for all users).

For enterprise technical decision makers, z.ai’s GLM-5.2 provides a highly capable path to hosting frontier-grade AI locally, completely bypassing geographic barriers and business limitations.

IndexShare reuses one indexer for every four dispersed attention layers, reducing computing needs

Under the hood, GLM-5.2 operates with 753 billion parameters and introduces a major architectural optimization called "IndexShare".

In standard bulk language models, recomputing attention mechanisms in long documents is computationally exorbitant. IndexShare solves this by reusing the identical indexer in every four sparse attention layers.

With the maximum context length of 1 million tokens, this single innovation reduces compute FLOPs per token by 2.9x.

The model also features an improved Multi-Token Prediction (MTP) layer for speculative decoding, which increases the length of accepted tokens by up to 20% during inference.

Additionally, Z.ai has implemented flexible and selectable options. "modes of thought". Users can toggle the model’s reasoning effort between "maximum," designed to push the limits of logical problem solving, or "High," which strikes a careful balance between high-end performance and latency-sensitive token efficiency.

State-of-the-art benchmarks for an open model, matching and even surpassing proprietary leaders in some categories.

In industry-standard third-party benchmark tests, GLM-5.2 performs above most open source flagships, including Deep Search v4 and scores close to or better than its closed-weight rivals, OpenAI’s GPT-5.5 and Anthropic’s Claude Opus 4.8.

The model particularly shines in the use of agent tools and in long-term software engineering tasks:

SWE Pro Bench: The GLM-5.2 obtained a score of 62.1, decisively surpassing the GPT-5.5 (58.6) and its own predecessor, the GLM-5.1 (58.4).
SWE Border (Dominance): Designed to test long-term task completion, GLM-5.2 achieved 74.4%, beating GPT-5.5 (72.6%) and finishing almost tied with Claude Opus 4.8 (75.1%).

MCP Atlas: In this tool usage evaluation, GLM-5.2 achieved a 77.0, beating GPT-5.5 (75.3) and performing just shy of Claude Opus 4.8 (77.8).
The last examination of humanity (with tools): When equipped with external tools, GLM-5.2 achieved a score of 54.7, surpassing GPT-5.5 (52.2) and closely following Claude Opus 4.8 (57.9).
PostTrainBench and SWE-Marathon: In extended multi-hour engineering workloads, GLM-5.2 consistently outperformed GPT-5.5, scoring 34.3% vs. 25.0% for GPT-5.5 on PostTrainBench and 13.0% vs. 12.0% for GPT-5.5 on SWE-Marathon.

While the GLM-5.2 trails slightly behind the Claude Opus 4.8 and GPT-5.5 in Terminal-Bench 2.1 raw scores (81.0 versus 85.0 and 84.0, respectively), it significantly outperforms Google’s Gemini 3.1 Pro (74.0).

Beyond traditional coding metrics, GLM-5.2 earned an impressive first place in the collaborative design task benchmark test Design fieldsurpassing even the aforementioned last-gen Claude Fable 5 with a ELO score from 1360.

Additionally, the impact of Z.ai’s new selectable "ways of thinking" is clearly visible in the data: under the "max." At the effort level, GLM-5.2 achieves maximum intelligence, but uses almost 85,000 output tokens per task. changing to the "High" The effort setting sacrifices only a few points in performance while halving the required token output, providing a crucial optimization lever for latency-sensitive applications.

Available through coding plans and API

To put the model into practice, Z.ai launched the GLM coding plantargeting directly developer workflows rather than simple chat interfaces.

The plan offers out-of-the-box support for third-party US and global agent coding tools and harnesses, including Claude Code, OpenClaw, Cline, Kilo Code, Crush, and Factory, among others. Encoding Plan pricing tiers (when billed annually) are highly competitive:

Lite: $12.60 per month ($151.20 per year starting in year 2), geared toward light iterations on small repositories.
Pro: $50.40 per month for daily development on medium-sized repositories, offering 5x the usage allowance of the Lite plan.
Maximum: $112.00 per month for heavy workloads, offering 20x the usage of Lite and dedicated resources during peak hours.

For enterprise developers integrating the raw model into their own applications, Z.ai’s API pricing significantly undercuts its Western rivals while matching the exact rates of the previous generation GLM-5.1.

Access to the GLM-5.2 API is priced at $1.40 per million input tokens and $4.40 per million output tokensmaking it a mid-priced model globally, but approximately

VentureBeat Frontier AI Model API Pricing Overview

Sorted by total cost (input + production) from least to most expensive. The price shown is the standard pay-as-you-go price for 1 million tokens.

Model	Input	Production	Total Cost	Fountain
Flash MiMo-V2.5	$0.10	$0.30	$0.40	Xiaomi MiMo
deepseek-v4-flash	$0.14	$0.28	$0.42	deep search
deepseek-v4-pro	$0.435	$0.87	$1,305	deep search
MiniMax-M3	$0.30	$1.20	$1.50	minimax
Gemini 3.1 Flash-Lite	$0.25	$1.50	$1.75	Google
Qwen3.7-Plus	$0.40	$1.60	$2.00	Alibaba Cloud
MiMo V2.5	$0.40	$2.00	$2.40	Xiaomi MiMo
Grok 4.3 (under context)	$1.25	$2.50	$3.75	xAI
MiMo-V2.5 Pro (≤256K)	$1.00	$3.00	$4.00	Xiaomi MiMo
Kimi-K2.6	$0.95	$4.00	$4.95	Moonshot/Kimi
GLM-5.2	$1.40	$4.40	$5.80	Z.ai
Grok 4.3 (high context)	$2.50	$5.00	$7.50	xAI
MiMo-V2.5 Pro (>256K)	$2.00	$6.00	$8.00	Xiaomi MiMo
Qwen3.7-Max	$2.50	$7.50	$10.00	Alibaba Cloud
Gemini 3.5 Flash	$1.50	$9.00	$10.50	Google
Gemini 3.1 Pro Preview (≤200K)	$2.00	$12.00	$14.00	Google
GPT-5.4	$2.50	$15.00	$17.50	Open AI
Gemini 3.1 Pro Preview (>200K)	$4.00	$18.00	$22.00	Google
Close Job 4.8	$5.00	$25.00	$30.00	anthropic
GPT-5.5	$5.00	$30.00	$35.00	Open AI
Claude Fable 5 / Claude Myths 5	$10.00	$50.00	$60.00	anthropic

To further optimize costs for long-context workloads, Z.ai offers an entry caching rate of just $0.26 per million tokens, along with a limited-time offer of free entry caching.

The stark contrast between open-weight innovators and proprietary Western labs has not gone unnoticed by the developer community.

On X, prolific AI observer Lisan al Gaib (@scaling01) argued that "Frontier Labs is absolutely ripping you off with API pricing".

The publication noted that while massive open models like the 744 billion-parameter GLM-5.2 charge $4.40 per million output tokens and DeepSeek-V4-Pro (1.6 billion parameters) charge just $0.87, proprietary models command high premiums: Anthropic’s Sonnet 4.6 and Opus 4.8 charge $15.00 and $25.00 respectively, while GPT-5.5 from OpenAI costs $30.00 per output.

Highlighting that open model developers are operating profitably without depending on the latest "fancy Blackwell chips," The commenter suggested that the major proprietary laboratories are "probably with margins of over 90% at this time".

The beauty of the unmodified MIT license for enterprise use

The most disruptive aspect of the GLM-5.2 version is its license. Z.ai released the model weights under an open source license from MIT, establishing it as a "pure open" system.

The company’s technical documentation explicitly states that this license guarantees "no regional limits" and allows "technical access without borders".

For enterprise technology leaders, an MIT license means that the software can be used, modified, and commercialized without paying royalties or complying with restrictions. "acceptable use" governance policies common to dual-use licenses.

It allows engineering teams to host frontier-grade AI on their own sovereign infrastructure, completely eliminating vendor lock-in.

Warm reception among AI tool developers and manufacturers

Developer reaction to the release has been immediate and overwhelmingly positive.

The team behind Kilocode Integration confirmed from day one, published in X: "GLM-5.2 has been running on Kilo Code since day one. The 1M context window and maximum effort mode are active. Point your settings at it and you’re done!".

Open source coding environment Cline IDE echoed this sentiment at Xhighlighting the economic advantage: "GLM-5.2 is the first open weight model to exceed 80% in Terminal-Bench and outperforms all other open models available. It also outperforms the Gemini, making it a cutting-edge model for a fraction of the cost. Open weights are back. This model changes the rules of the game. Available at Cline now!".

Similarly, the open source coding rival desktop agent AI owner He also tested the model’s new capabilities in complex agent workflows, observing in X: "launched a real long-term task: research 30 companies in 6 sectors of AI infrastructure, structure it in JSON and then create an interactive HTML report… where 5.2 advances: -> plans…".

Source link

Z.ai’s open weight GLM-5.2 beats GPT-5.5 in multiple long-horizon encoding benchmarks by 1/6 the cost

IndexShare reuses one indexer for every four dispersed attention layers, reducing computing needs

State-of-the-art benchmarks for an open model, matching and even surpassing proprietary leaders in some categories.

Available through coding plans and API

VentureBeat Frontier AI Model API Pricing Overview

The beauty of the unmodified MIT license for enterprise use

Warm reception among AI tool developers and manufacturers

Leave a ReplyCancel Reply

Beats Studio Buds firmware update includes important microphone security fix

Google’s Pixel June update includes dozens of fixes

Viture, Nvidia XR AI partner for safety glasses that bring true intelligence to the workforce

IndexShare reuses one indexer for every four dispersed attention layers, reducing computing needs

State-of-the-art benchmarks for an open model, matching and even surpassing proprietary leaders in some categories.

Available through coding plans and API

VentureBeat Frontier AI Model API Pricing Overview

The beauty of the unmodified MIT license for enterprise use

Warm reception among AI tool developers and manufacturers

Leave a ReplyCancel Reply

Trending now

Beats Studio Buds firmware update includes important microphone security fix

Google’s Pixel June update includes dozens of fixes

Viture, Nvidia XR AI partner for safety glasses that bring true intelligence to the workforce