Nvidia's new open-weight Nemotron 3 super combines three different architectures to outperform gpt-oss and Qwen

Multi-agent systems, designed to handle long-term tasks like software engineering or cybersecurity triage, can generate up to 15 times the token volume of standard chats, threatening their profitability in handling business tasks.

But today, Nvidia sought to help solve this problem with the launch of Nemotron 3 Supera hybrid model of 120 billion parameters, with weights published in hugging face.

By fusing disparate architectural philosophies: state-space models, transformers, and a novel "Latent" Expert Combination Design: Nvidia is attempting to provide the specialized depth needed for agent workflows without the overhead typical of dense reasoning models, and all available for commercial use under mostly open weights.

Hybrid triple architecture

At the core of Nemotron 3 Super is a sophisticated architectural triad that balances memory efficiency with precision reasoning. The model uses a Mamba-Transformer Hybrid Spinethat interweaves layers of Mamba-2 with strategic layers of Transformer attention.

To understand the implications for business production, consider the "needle in a haystack" problem. The layers of Mamba-2 act as a "fast trip" highway system, which handles the vast majority of stream processing with linear time complexity. This allows the model to maintain a massive context window of 1 million tokens without blowing up the memory footprint of the KV cache. However, pure state space models often have problems with associative recall.

To solve this problem, Nvidia strategically inserts Transformer attention layers like "global anchors," ensure that the model can accurately retrieve specific facts buried deep in a codebase or a stack of financial reports.

Beyond the spine, the model introduces Latent Mix of Experts (LatentMoE). Traditional mix-of-experts (MoE) designs send tokens to experts in their full hidden dimension, creating a computational bottleneck as models scale. LatentMoE solves this by projecting tokens into a compressed space before sending them to specialists.

This "expert compression" It allows the model to consult four times as many specialists for exactly the same computational cost. This granularity is vital for agents who must switch between Python syntax, SQL logic, and conversational reasoning in a single turn.

Further accelerating the model is multi-token prediction (MTP). While standard models predict a single next token, MTP predicts multiple future tokens simultaneously. This serves as "built-in eraser model," enabling native speculative decoding that can deliver up to 3x wall clock speeds for generation tasks structured as code or tool calls.

The Blackwell Advantage

For enterprises, the most significant technical leap from Nemotron 3 Super is its optimization for the Nvidia Blackwell GPU platform. By natively pre-training in NVFP4 (4-bit floating point), Nvidia has made a huge leap forward in production efficiency.

On Blackwell, the model delivers 4x faster inference than 8-bit models running on the previous Hopper architecture, without loss of accuracy.

In practice, Nemotron 3 Super is a specialized tool for agentic reasoning.

It currently holds the number one position on the DeepResearch Bench, a benchmark that measures an AI’s ability to perform comprehensive, multi-step research on large sets of documents.

Benchmark	Nemotron 3 Super	Qwen3.5-122B-A10B	GPT-OSS-120B
General knowledge
MMLU-Pro	83.73	86.70	81.00
Reasoning
AIME25 (without tools)	90.21	90.36	92.50
HMMT February 25 (without tools)	93.67	91.40	90.00
HMMT February 25 (with tools)	94.73	89.55	—
GPQA (tool-free)	79.23	86.60	80.10
GPQA (with tools)	82.70	—	80.09
LiveCodeBench (v5 2024-07↔2024-12)	81.19	78.93	88.00
SciCode (subtask)	42.05	42.00	39.00
HLE (tool-free)	18.26	25.30	14.90
HLE (with tools)	22.82	—	19.0
agent
Terminal bank (hard subassembly)	25.78	26.80	24.00
Core 2.0 Terminal Bank	31.00	37.50	18.70
SWE Bank (OpenHands)	60.47	66.40	41.9
SWE Bank (open source)	59.20	67.40	—
SWE Bank (Codex)	53.73	61.20	—
Multilingual SWE-Bench (OpenHands)	45.78	—	30.80
Tau V2 Bench
Airline	56.25	66.0	49.2
Retail	62.83	62.6	67.80
telecommunications	64.36	95.00	66.00
Average	61.15	74.53	61.0
BrowseComp with search	31.28	—	33.89
BIRD bench	41.80	—	38.25
Chat and follow instructions
IFBench (fast)	72.56	73.77	68.32
Scale the AI multi-challenge	55.23	61.50	58.29
Sand-Hard-V2	73.88	75.15	90.26
Long context
AA-CSF	58.31	66.90	51.00
RULE @ 256k	96.30	96.74	52.30
RULE @512k	95.67	95.95	46.70
RULE @ 1M	91.75	91.33	22.30
Multilingual
MMLU-ProX (average length)	79.36	85.06	76.59
WMT24++ (en→xx)	86.67	87.84	88.89

It also demonstrates significant performance benefits, achieving up to 2.2x performance over gpt-oss-120B and 7.5x performance over Qwen3.5-122B in high-volume environments.

Custom ‘open’ license: commercial use but with important caveats

The launch of Nemotron 3 Super under the Nvidia Open Model License Agreement (updated October 2025) provides a permissive framework for enterprise adoption, although it comes with different "safeguard" clauses that differentiate it from pure open source licenses such as MIT or Apache 2.0.

Key provisions for business users:

Commercial usability: The license explicitly states that the models are "commercially usable" and grants a perpetual, worldwide, royalty-free license to sell and distribute products based on the model.
Output property: Nvidia does not claim the results generated by the model; Responsibility for those results (and ownership of them) lies entirely with the user.
Derivative works: Companies are free to create and own "Derived models" (enhanced versions), provided they include the required attribution notice: "Licensed by Nvidia Corporation under the Nvidia Open Model License."

He "red lines":

The license includes two critical termination triggers that production teams must monitor:

Safety railings: The license automatically terminates if a user ignores or circumvents the model rules. "Railings" (technical limitations or security hyperparameters) without implementing a "substantially similar" appropriate replacement for the use case.
Litigation trigger: If a user initiates copyright or patent litigation against Nvidia alleging that the model infringes their intellectual property, their license to use the model terminates immediately.

This structure allows Nvidia to foster a commercial ecosystem while protecting itself from "IP trolling" and ensure that the model is not stripped of its security features for malicious use.

“The team really cooked”

The release has generated quite a stir within the developer community. Chris Alexiuk, senior product research engineer at Nvidia, announced the launch of X under his leadership @llm_wizard like a "SUPER DAY," emphasizing the speed and transparency of the model. "The model is: FAST. The model is: SMART. The model is: THE MOST OPEN MODEL WE HAVE MADE YET," Chris posted, highlighting the release of not only weights, but also 10 billion tokens of training and recipe data.

Industry adoption reflects this enthusiasm:

Cloud and hardware: The model is being implemented as a Nvidia NIM Microserviceallowing it to run locally via Dell AI Factory either HPEas well as in Google Cloud, Oracle and, soon, AWS and Azure.
Production agents: Companies like CodeRabbit (software development) and greptile are integrating the model to handle large-scale codebase analysis, while industry leaders like Siemens and Palantir They are implementing it to automate complex workflows in manufacturing and cybersecurity.

As noted by Kari Briski, vice president of AI software at Nvidia: "As companies move beyond chatbots and adopt multi-agent applications, they are encountering… an explosion of context."

Nemotron 3 Super is Nvidia’s response to that explosion: a model that provides the "intellectual capacity" of a 120B parameter system with the operational efficiency of a much smaller specialist. For the company, the message is clear: the "thought tax" It’s finally going down.

Source link

Nvidia’s new open-weight Nemotron 3 super combines three different architectures to outperform gpt-oss and Qwen

Hybrid triple architecture

The Blackwell Advantage

Custom ‘open’ license: commercial use but with important caveats

“The team really cooked”

Leave a ReplyCancel Reply

eight events, 8,000 technology leaders, a business agenda

Building a Medicaid Consumer Segmentation Channel with Python, K-Means, and Tableau

Opera adds Paste Protect feature to block ClickFix clipboard attacks in latest version

Hybrid triple architecture

The Blackwell Advantage

Custom ‘open’ license: commercial use but with important caveats

“The team really cooked”

Leave a ReplyCancel Reply

Trending now

eight events, 8,000 technology leaders, a business agenda

Building a Medicaid Consumer Segmentation Channel with Python, K-Means, and Tableau

Opera adds Paste Protect feature to block ClickFix clipboard attacks in latest version