
(AsiaGameHub) – Nvidia has unveiled Nemotron 3 Super, a new open-access AI model engineered to operate faster and process extremely lengthy prompts. The company is targeting this offering at developers building AI agents, a use case where costs can rise sharply when models need to complete multi-step reasoning tasks.
Good to Know
- According to Nvidia, Nemotron 3 Super delivers up to 7.5 times greater throughput compared to Qwen3.5 122B A10B.
- This model supports context windows as large as 1 million tokens.
- Nvidia has made both the model and its associated training resources openly accessible to the public.
Engineered for Speed and Lengthy Input Processing
Nemotron 3 Super does not activate its full set of parameters every time it generates a response. Instead, it adopts a Mixture of Experts design, where only a portion of the model is switched on for each individual task. Nvidia states this design helps cut inference costs and makes the model more practical for AI agents that typically consume large volumes of tokens during operation.
Across its 88 total layers, the model combines both Mamba and Transformer layers. In simple terms, one layer type helps it process very long inputs more efficiently, while the other preserves its response accuracy. Nvidia says this configuration gives the model a native context window of up to 1 million tokens.
Nvidia has also integrated a routing system called LatentMoE. It directs each task to a small subset of expert modules inside the model instead of activating the entire system. Per Nvidia’s claims, this enables higher levels of specialization without driving up inference costs in the way standard MoE systems do.
The company says under its specified test setup, Nemotron 3 Super delivers 2.2 times the throughput of GPT OSS 120B and 7.5 times the throughput of Qwen3.5 122B A10B. Nvidia also notes it offers over 5 times higher throughput and up to double the accuracy of the previous Nemotron Super version.
The model was trained on 25 trillion tokens, followed by an additional training phase using 51 billion tokens to extend its context length to 1 million tokens. Nvidia then applied supervised fine-tuning and reinforcement learning techniques to improve its overall performance.
Benchmark results for the model were also strong. Nvidia reports scores of 83.73 on MMLU Pro, 90.21 on AIME25, 60.47 on SWE Bench with OpenHands, 85.6% on PinchBench, and 91.64 on RULER 1M. The model also powers Nvidia AI Q, a research agent that claimed the top position on the Deepresearch Bench leaderboard.
Nvidia trained the model using NVFP4, a format developed specifically for Blackwell GPUs. When running on B200 hardware, the company says inference speeds can reach up to 4 times faster than FP8 format running on H100 GPUs, with no reported drop in accuracy.
Nemotron 3 Super is available under the Nvidia Nemotron Open Model License. Developers can access its checkpoints in BF16, FP8, and NVFP4 formats on Hugging Face. Nvidia also supports inference via Nvidia NIM, build.nvidia.com, Perplexity, Openrouter, Together AI, Google Cloud, AWS, Azure, Coreweave, Dell Enterprise Hub, and HPE. Additional guides and implementation resources are available through NeMo.
This article is provided by a third-party. AsiaGameHub (https://asiagamehub.com/) makes no warranties regarding its content.
AsiaGameHub delivers targeted distribution for iGaming, Casino, and eSports, connecting 3,000+ premium Asian media outlets and 80,000+ specialized influencers across ASEAN.