Cohere achieves lossless quantization and native citations with the first fully licensed Apache 2.0 Command A+ open model



Canadian AI Laboratory Adhere caused a sensation recently announcing a merger with German AI startup Aleph AlphaBut now it has even more in store for business creators around the world: today, the company co-founded by former Googler and "Attention is all you need" co-author Aidan Gómez sleepless Command A+a highly optimized 218 billion parameter language model designed specifically for complex reasoning, multimodal document processing, and agentic workflows.

The most significant aspect of the launch is not just the model’s capabilities; It is its accessibility.

By releasing the model weights freely on the Popular Hugging Face AI code sharing repository under a Highly permissive Apache 2.0 open source license – a first for the company, according to a publication by Gómez, now CEO of Cohere, in X — Cohere is making a calculated bet "Sovereign AI"—the thesis that companies, governments and developers should have the ability to run, control and adapt cutting-edge AI entirely within their own secure environments, without sacrificing performance.

Sparse architecture with extreme quantization

Architecturally, Command A+ represents a significant evolution over Cohere’s previous dense models. It is a decoder-only sparse mixing of experts (MoE) transformer.

While the model hosts a relatively modest total of 218 billion parameters, even fewer (only 25 billion) are active during any given generation step. It has a much lighter footprint and requires far fewer compute resources to run in inference (delivering the model in production environments to end users or through agents) than US proprietary giants like OpenAI’s GPT-5.5 and Anthropic’s Claude Opus 4.7, which are estimated by external observers in billions of parameters.

This dispersed architecture is the key to the efficiency of the model. In simple terms, an MoE model directs incoming queries only to the specific address. "expert" Neural networks are best suited to handle them, leaving the rest of the model inactive.

This is a formulation familiar and followed by most leading LLMs these days, allowing models to retain the broad knowledge base and nuanced reasoning capabilities of a giant, but at faster speeds and reduced computing and power requirements of a much smaller model, as only a fraction of the parameters are activated at any time.

But where Cohere has gone a step further than most for Command A+ is that it has focused heavily on hardware efficiency through quantization, a process that compresses the model’s memory footprint by reducing the precision of its parameters.

Command A+ is available in 16-bit (BF16), 8-bit (FP8), and highly compressed 4-bit (W4A4) formats.

W4A4 quantization is the technical centerpiece of this release. Normally, reasoning models suffer enormous "quantification tax," where compressing the model leads to visible regressions in solving complex problems.

Cohere mitigated this by only quantizing the MoE experts to 4 bits, while maintain critical care pathways with complete precision, supplemented with a technique called Quantification Aware Distillation.

The result is a almost lossless compression that allows this huge model to run on a single NVIDIA Blackwell B200 GPU or just two NVIDIA H100 GPUs.

The speed gains are equally notable. According to performance data released by the company, low-concurrency W4A4 quantization achieves 375 tokens per second (TOPS) with a time to first token (TTFT) latency of just 113 milliseconds, representing up to a 63% increase in output speed and a 17% reduction in latency compared to the previous Command A Reasoning model.

Additionally, Cohere has revised the model’s tokenizer. Tokenizers break down text into fragments that are processed by AI models. The new tokenizer is highly optimized for global enterprise use and offers native support for 48 languages.

More importantly, it is dramatically improves the efficiency of tokenization for non-European languages, reducing the number of tokens needed to generate responses in Arabic by 20%, in Japanese by 18%, and in Korean by 16%. Because inference costs are calculated per token, this directly translates to lower operational costs for global, multilingual, or non-English deployments.

Agent workflows and high standards in mathematics and specialized fields.

While raw speed and size dictate implementation, the usefulness of a model is defined by the capabilities of its product. Command A+ was created specifically for "agent" Tasks: Workflows in which the AI ​​operates autonomously or semi-autonomously, uses external tools, queries databases, and synthesizes information in multiple steps.

The reference jumps with respect to the previous generation are marked.

On 𝜏²-Bench Telecom, which tests complex reasoning, the model jumped from a score of 37% to 85%. In Terminal-Bench Hard, which measures agent coding performance, it rose from 3% to 25%. In complex mathematics, he scored 90% in AIME 25, up from 57%.

Command A+ punches above its weight class (25B active parameters) in pure reasoning and mathematics, directly competing with much larger models like DeepSeek V4 Pro in mathematical benchmark tests. However, when it comes to deep agent coding and large-scale general intelligence indexing, it currently lags behind the latest generations of Chinese open source rivals such as deep search, Z.ai (GLM)and minimax.

That said, comparing them directly ignores Cohere’s core value proposition: hardware efficiency.

Beyond benchmarks, Command A+ introduces deep integrations for enterprise trust and verification. The model supports the use of conversational tools through standard chat templates, allowing developers to seamlessly connect it to internal APIs, search engines or SQL databases.

Fundamentally, Command A+ features native citation generation. When Command A+ retrieves information from an external tool, it doesn’t just synthesize the response; generates explicit "grounding sections." Using special tags embedded in the output, the The model directly links each factual claim it makes to the specific source document or database row. he got the information from.

For companies in highly regulated sectors such as finance, healthcare or legal, this traceability makes the difference between an interesting prototype and a production-ready application. If a user requests a daily sales report, the model will generate the total sales amount and explicitly cite the result of the database query that provided that number, minimizing the risk of undetected hallucinations.

Additionally, Command A+ is fully multimodal, capable of processing text and images natively within its huge 128 KB input context window, making it highly effective for complex document processing, such as analyzing invoices, charts, or scanned technical manuals.

The first Cohere AI model to be fully licensed under Apache 2.0

In the current AI landscape, "open source" It has become a tense term. Many leading AI companies publish their model weights under restrictive commercial licenses or acceptable use policies that explicitly prohibit large companies from using the models for commercial purposes, or prohibit the models from being used to train competing AI systems.

In fact, previous Cohere models, including Command R and Command R+They were published under a CC-BY-NC 4.0 (Creative Commons NonCommercial) license. While their model weights were open for researchers and developers to download, modify, and evaluate, their use for commercial purposes was strictly prohibited without purchasing a separate enterprise license from Cohere or going through its application programming interface (API), similar to the arrangement many companies use to access AI models from OpenAI, Anthropic, Google, and other leading labs.

Cohere has changed its approach by releasing Command A+ under the Apache 2.0 license. This is a critical distinction for the developer community. Apache 2.0 is a true OSI approved open source license. It allows anyone (from independent developers to Fortune 500 corporations) to use, modify, distribute, and market the model without paying licensing fees or adhering to restrictive non-compete clauses.

As Gómez wrote inThe decision was defended by Cohere co-founder Nick Frosst, who posted a two-minute overview calling it "The best model we have ever released."

For the company, this license means complete independence from the supplier. A company can download the Command A+ weights, fine-tune them on highly classified internal data, and deploy them to their own private servers or isolated networks. They are not tied to Cohere infrastructure, pricing changes, or API uptime. It is the ultimate realization of sovereign AI.

The release saw immediate traction across the AI ​​developer ecosystem, driven largely by its day-one integration with leading open source inference frameworks like Hugging Face and vLLM.

What’s next?

The release of Command A+ marks a maturation of the open source AI ecosystem. By combining cutting-edge reasoning, robust use of agent tools, and multimodal capabilities with an architecture designed specifically for hardware efficiency, Cohere is changing the calculus for enterprise AI adoption.

The requirement for massive, centralized computing clusters has long been a bottleneck for companies that prioritize data privacy and cost control. By democratizing access to a model of this caliber under a true open source license, Cohere has given the enterprise market exactly what it was asking for: the power of the cloud, able to run securely in the server room down the hall.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *