NVIDIA officially released Nemotron 3 Super on March 11, 2026, introducing a 120-billion-parameter open hybrid Mamba-Transformer Mixture-of-Experts model designed specifically for agentic AI workloads.
The model features architectural innovations including Latent MoE for 4x expert utilization at fixed computational cost, multi-token prediction for built-in speculative decoding, and a hybrid Mamba-Transformer backbone that combines sequence efficiency with precision reasoning. Trained natively in NVFP4 precision on Blackwell architecture, Nemotron 3 Super activates only 12 billion parameters per token while maintaining a 1 million token context window, delivering over 5x higher throughput than its predecessor.
This release matters because it directly addresses the “thinking tax” in multi-agent systems where excessive token generation creates cost and latency barriers. By achieving PinchBench scores of 85.6% – the highest among open models in its class – Nemotron 3 Super demonstrates that specialized architectural choices can significantly improve agentic reasoning without proportional increases in computational overhead.
The impact extends to developers building autonomous agent applications, who now have access to an open model with enterprise-friendly licensing that can be deployed across infrastructure from workstations to cloud platforms. With availability through NVIDIA NIM, Hugging Face, and major inference providers including Baseten, Cloudflare, and Together AI, the model lowers barriers to implementing sophisticated multi-agent workflows in software development, cybersecurity triaging, and complex reasoning tasks.





[…] However, as of late April 2026, the arrival of Kimi K2.6—a trillion-parameter, open-weight Mixture-of-Experts (MoE) model—has fundamentally challenged this hierarchy. By achieving state-of-the-art results on […]
[…] does not reinvent the architecture. It carries forward the same Mixture-of-Experts design that Moonshot has refined across five major releases since July 2025: 1 trillion total […]
[…] executing up to 4,000 coordinated steps in a single run. The model runs on a 1-trillion-parameter Mixture-of-Experts (MoE) backbone that activates only 32 billion parameters per token, keeping inference costs comparable to […]
[…] April 20, 2026, Moonshot AI released Kimi K2.6, a trillion-parameter open-weight Mixture-of-Experts model that immediately distinguished itself not through benchmark numbers, but through a single, […]