Nvidia’s best model is now live

Wait 5 sec.

After pre-announcing Nemotron 3 Ultra, a 550-billion-parameter open-weight mixture-of-experts model, at Computex, Nvidia on Thursday released the model on platforms like Hugging Face, ModelScope, OpenRouter (with a free endpoint), and build.nvidia.com. The new model uses the same latent mixture-of-experts technique and Mamba 2 architecture as the other models in the Nemotron 3 family, bringing the number of active parameters down to 55 billion. It can support context windows of up to 1 million tokens.As Nvidia notes, the new model has been tuned to power long-running agents that need to plan, call tools, and iterate over complex tasks. For this, the model needs to be not just smart enough but also fast enough. Indeed, Nvidia is emphasizing speed with this release, noting that it is significantly faster than its previous generation of models.Given the current concerns around token costs, what may matter more here is that Nvidia also claims the model could save users up to 30% compared to similarly powerful models.Credit: NvidiaWhile it is the fastest model among its direct competitors like Kimi-K2.6, Qwen-3.5, and GML-5.1 — and the best U.S. open-weight model yet — it does still trail the best of these Chinese models on most benchmarks, even if only by a few points.And while Nvidia calls this a frontier model, the benchmarks don’t quite tell this story. On GDPVal, which tests how well a model performs real-world, economically valuable tasks, Nemotron 3 Ultra — in its NVFP4 variant, which uses Nvidia’s new quantization-aware pre-training technique — scores 47.9%. By comparison, OpenAI’s GPT-5.5 scores 84.9%.Credit: NvidiaBenchmarks don’t always capture a model’s strengths, though, and Nvidia notes that the model can handle “the orchestration and hardest reasoning calls in an autonomous workflow: architectural decisions in long-running coding sessions, synthesis across hundreds of research sources and verification across thousands of interdependent constraints.”Credit: NvidiaThe model was trained on a curated dataset of 14.8 trillion tokens, enabling it to support 12 languages (English, French, Spanish, Italian, German, Japanese, Korean, Hindi, Brazilian Portuguese, and Chinese) and 43 programming languages.Nvidia is making the weights, datasets, and training recipes available. The model is available under the OpenMDW-1.1 license.The post Nvidia’s best model is now live appeared first on The New Stack.