Nvidia’s new open-weight Nemotron 3 super combines three different architectures to outperform gpt-oss and Qwen



Multi-agent systems, designed to handle long-term tasks like software engineering or cybersecurity triage, can generate up to 15 times the token volume of standard chats, threatening their profitability in handling business tasks.

But today, Nvidia sought to help solve this problem with the launch of Nemotron 3 Supera hybrid model of 120 billion parameters, with weights published in hugging face.

By fusing disparate architectural philosophies: state-space models, transformers, and a novel "Latent" Expert Combination Design: Nvidia is attempting to provide the specialized depth needed for agent workflows without the overhead typical of dense reasoning models, and all available for commercial use under mostly open weights.

Hybrid triple architecture

At the core of Nemotron 3 Super is a sophisticated architectural triad that balances memory efficiency with precision reasoning. The model uses a Mamba-Transformer Hybrid Spinethat interweaves layers of Mamba-2 with strategic layers of Transformer attention.

To understand the implications for business production, consider the "needle in a haystack" problem. The layers of Mamba-2 act as a "fast trip" highway system, which handles the vast majority of stream processing with linear time complexity. This allows the model to maintain a massive context window of 1 million tokens without blowing up the memory footprint of the KV cache. However, pure state space models often have problems with associative recall.

To solve this problem, Nvidia strategically inserts Transformer attention layers like "global anchors," ensure that the model can accurately retrieve specific facts buried deep in a codebase or a stack of financial reports.

Beyond the spine, the model introduces Latent Mix of Experts (LatentMoE). Traditional mix-of-experts (MoE) designs send tokens to experts in their full hidden dimension, creating a computational bottleneck as models scale. LatentMoE solves this by projecting tokens into a compressed space before sending them to specialists.

This "expert compression" It allows the model to consult four times as many specialists for exactly the same computational cost. This granularity is vital for agents who must switch between Python syntax, SQL logic, and conversational reasoning in a single turn.

Further accelerating the model is multi-token prediction (MTP). While standard models predict a single next token, MTP predicts multiple future tokens simultaneously. This serves as "built-in eraser model," enabling native speculative decoding that can deliver up to 3x wall clock speeds for generation tasks structured as code or tool calls.

The Blackwell Advantage

For enterprises, the most significant technical leap from Nemotron 3 Super is its optimization for the Nvidia Blackwell GPU platform. By natively pre-training in NVFP4 (4-bit floating point), Nvidia has made a huge leap forward in production efficiency.

On Blackwell, the model delivers 4x faster inference than 8-bit models running on the previous Hopper architecture, without loss of accuracy.

In practice, Nemotron 3 Super is a specialized tool for agentic reasoning.

It currently holds the number one position on the DeepResearch Bench, a benchmark that measures an AI’s ability to perform comprehensive, multi-step research on large sets of documents.

Benchmark

Nemotron 3 Super

Qwen3.5-122B-A10B

GPT-OSS-120B

General knowledge

MMLU-Pro

83.73

86.70

81.00

Reasoning

AIME25 (without tools)

90.21

90.36

92.50

HMMT February 25 (without tools)

93.67

91.40

90.00

HMMT February 25 (with tools)

94.73

89.55

GPQA (tool-free)

79.23

86.60

80.10

GPQA (with tools)

82.70

80.09

LiveCodeBench (v5 2024-07↔2024-12)

81.19

78.93

88.00

SciCode (subtask)

42.05

42.00

39.00

HLE (tool-free)

18.26

25.30

14.90

HLE (with tools)

22.82

19.0

agent

Terminal bank (hard subassembly)

25.78

26.80

24.00

Core 2.0 Terminal Bank

31.00

37.50

18.70

SWE Bank (OpenHands)

60.47

66.40

41.9

SWE Bank (open source)

59.20

67.40

SWE Bank (Codex)

53.73

61.20

Multilingual SWE-Bench (OpenHands)

45.78

30.80

Tau V2 Bench

Airline

56.25

66.0

49.2

Retail

62.83

62.6

67.80

telecommunications

64.36

95.00

66.00

Average

61.15

74.53

61.0

BrowseComp with search

31.28

33.89

BIRD bench

41.80

38.25

Chat and follow instructions

IFBench (fast)

72.56

73.77

68.32

Scale the AI ​​multi-challenge

55.23

61.50

58.29

Sand-Hard-V2

73.88

75.15

90.26

Long context

AA-CSF

58.31

66.90

51.00

RULE @ 256k

96.30

96.74

52.30

RULE @512k

95.67

95.95

46.70

RULE @ 1M

91.75

91.33

22.30

Multilingual

MMLU-ProX (average length)

79.36

85.06

76.59

WMT24++ (en→xx)

86.67

87.84

88.89

It also demonstrates significant performance benefits, achieving up to 2.2x performance over gpt-oss-120B and 7.5x performance over Qwen3.5-122B in high-volume environments.

Custom ‘open’ license: commercial use but with important caveats

The launch of Nemotron 3 Super under the Nvidia Open Model License Agreement (updated October 2025) provides a permissive framework for enterprise adoption, although it comes with different "safeguard" clauses that differentiate it from pure open source licenses such as MIT or Apache 2.0.

Key provisions for business users:

  • Commercial usability: The license explicitly states that the models are "commercially usable" and grants a perpetual, worldwide, royalty-free license to sell and distribute products based on the model.

  • Output property: Nvidia does not claim the results generated by the model; Responsibility for those results (and ownership of them) lies entirely with the user.

  • Derivative works: Companies are free to create and own "Derived models" (enhanced versions), provided they include the required attribution notice: "Licensed by Nvidia Corporation under the Nvidia Open Model License."

He "red lines":

The license includes two critical termination triggers that production teams must monitor:

  1. Safety railings: The license automatically terminates if a user ignores or circumvents the model rules. "Railings" (technical limitations or security hyperparameters) without implementing a "substantially similar" appropriate replacement for the use case.

  2. Litigation trigger: If a user initiates copyright or patent litigation against Nvidia alleging that the model infringes their intellectual property, their license to use the model terminates immediately.

This structure allows Nvidia to foster a commercial ecosystem while protecting itself from "IP trolling" and ensure that the model is not stripped of its security features for malicious use.

“The team really cooked”

The release has generated quite a stir within the developer community. Chris Alexiuk, senior product research engineer at Nvidia, announced the launch of X under his leadership @llm_wizard like a "SUPER DAY," emphasizing the speed and transparency of the model. "The model is: FAST. The model is: SMART. The model is: THE MOST OPEN MODEL WE HAVE MADE YET," Chris posted, highlighting the release of not only weights, but also 10 billion tokens of training and recipe data.

Industry adoption reflects this enthusiasm:

  • Cloud and hardware: The model is being implemented as a Nvidia NIM Microserviceallowing it to run locally via Dell AI Factory either HPEas well as in Google Cloud, Oracle and, soon, AWS and Azure.

  • Production agents: Companies like CodeRabbit (software development) and greptile are integrating the model to handle large-scale codebase analysis, while industry leaders like Siemens and Palantir They are implementing it to automate complex workflows in manufacturing and cybersecurity.

As noted by Kari Briski, vice president of AI software at Nvidia: "As companies move beyond chatbots and adopt multi-agent applications, they are encountering… an explosion of context."

Nemotron 3 Super is Nvidia’s response to that explosion: a model that provides the "intellectual capacity" of a 120B parameter system with the operational efficiency of a much smaller specialist. For the company, the message is clear: the "thought tax" It’s finally going down.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *