Key Highlights:
- NVIDIA launches the Nemotron 3 family of MoE models (Nano, Super, Ultra), which delivers superb accuracy and efficiency for building scalable agentic AI systems using open models.
- Nemotron 3 Nano achieves up to 4× higher throughput than Nemotron 2 Nano, thanks to a hybrid mixture-of-experts architecture optimized for multi-agent workloads.
- NVIDIA pairs the models with open datasets, reinforcement learning environments, and libraries, enabling developers to train, customize, and deploy specialized AI agents more efficiently.
NVIDIA has been among the top AI companies that is helping other AI giants with its infrastructure and GPU building capabilities that can run datacenters efficiently. Starting today, NVIDIA is also doubling down on open, agentic AI with the launch of Nemotron 3. It’s a new family of open models, datasets, and tools designed to power large-scale multi-agent systems across different industries.
NVIDIA Nemotron 3 lineup targets efficiency and scale across AI workloads
According to NVIDIA, the Nemotron 3 family includes Nano, Super, and Ultra models. All of them are designed for different workloads. Be it lightweight assistants to complex reasoning engine, all three make up a solid open model family. each tailored for different workloads, from lightweight assistants to complex reasoning engines. NVIDIA says that Nemotron Nano’s comes with a hybrid latent mixture-of-experts architecture that reduce inference costs while maintaining high accuracy.
“Open innovation is the foundation of AI progress,” said Jensen Huang, founder and CEO of NVIDIA. “With Nemotron, we’re transforming advanced AI into an open platform that gives developers the transparency and efficiency they need to build agentic systems at scale.”
Although it sounds otherwise, Nemotron 3 Nano, is, in fact, the standout model from the Nemotron 3 family. The company says that it delivers up to 4× higher throughput than Nemotron 2 Nano and significantly cuts reasoning token generation, eventually lowering costs for real-world deployments. With a 1-million-token context window, Nano can take on multiple real-world tasks like code debugging, summarization, retrieval, and AI assistant workflows. Independent benchmarks from Artificial Analysis rank it among the most efficient open models in its class.
“Super” & “Ultra” set to release in early 2026
Speaking of heavy workloads, Nemotron 3 Super is great with collaborative, low-latency agent systems. Whereas NVIDIA positions Nemotron 3 Ultra as a deep-reasoning engine for research and strategic planning. Both “Super” and “Ultra” use NVIDIA’s 4-bit NVFP4 format on Blackwell GPUs. As a result, it reduces memory requirements without compromising on accuracy. The company notes that Super and Ultra are expected in the first half of 2026.
Quick Summary about Nemotron 3 family of MoE models:
- Nemotron 3 Nano is a small, 30-billion-parameter model that “activates up to 3 billion parameters at a time for targeted, highly efficient tasks.”
- Nemotron 3 Super is a high-accuracy reasoning model with “approximately 100 billion parameters and up to 10 billion active per token, for multi-agent applications.”
- Nemotron 3 Ultra is a large reasoning engine with “about 500 billion parameters and up to 50 billion active per token, for complex AI applications.”
Many companies have started to adopt the new models
NVIDIA further notes that major companies, including Accenture, ServiceNow, Oracle, Palantir, Siemens, and Zoom, are already integrating Nemotron into production workflows. The idea is to give developers open, transparent models that can scale from dozens to hundreds of collaborating agents without blowing up compute budgets.
Perplexity is built on the idea that human curiosity will be amplified by accurate AI built into exceptional tools, like AI assistants. With our agent router, we can direct workloads to the best fine-tuned open models, like Nemotron 3 Ultra, or leverage leading proprietary models when tasks benefit from their unique capabilities — ensuring our AI assistants operate with exceptional speed, efficiency and scale.
– Aravind Srinivas, CEO of Perplexity.
Apart from these three models, NVIDIA has also released three trillion tokens of training and reinforcement learning data. Additionally, new open tools like NeMo Gym, NeMo RL, and NeMo Evaluator have also been announced. These are designed to help developers fine-tune agents safely, test behavior, and specialize models for real workflows. Those interested can get them from GitHub and Hugging Face.
“Nemotron 3 Nano is available as an NVIDIA NIM microservice for secure, scalable deployment anywhere on NVIDIA-accelerated infrastructure for maximum privacy and control,” the company notes.
Source: https://www.timesofai.com/
