/AI3d ago

NVIDIA releases Nemotron 3 Ultra, a 550B parameter open-weight hybrid Mamba2-Transformer MoE model for agentic workloads

AI Judge changed title after evaluation, original title: "NVIDIA releases Nemotron 3 Ultra, a 550-billion parameter open-weight hybrid Mamba2-Transformer MoE model"

Story Overview

NVIDIA has put out Nemotron 3 Ultra, a 550B-parameter open-weight model with only 55B active parameters that mixes Mamba2 layers and Transformer attention inside a Mixture-of-Experts setup. The release targets long-running agent workflows in coding, research, and enterprise settings, with support for up to 1M context and deployment on-premise, in the cloud, or at the edge. Weights and training details are available now under the OpenMDW 1.1 license on Hugging Face.

--0--
Original post
Bryan Catanzaro@ctnzr#434inAI

During the past 6 months, Nemotron has grown from 24 to 48 on the AAI, and we're just getting started.

NVIDIA Nemotron 3 Ultra is now live!

Frontier accuracy, 5X greater speed, 30% lower cost.

Deploy however you need - on-premise, on the cloud, or at the edge.

Model is live on HuggingFace under the OpenMDW 1.1 license.

https://www.youtube.com/watch?v=D8LIIvQVGS4

5:42 AM · Jun 4, 2026 · 2.5K Views

Speed and cost numbers rest on NVIDIA's own agent benchmarks

The company states up to 5x higher inference throughput and 30 percent lower cost per task than other open frontier models, backed by charts comparing it on SWE-Bench, Terminal-Bench, and similar suites. Independent confirmation is still absent, so the practical gains for any specific workload remain to be measured by users running the model themselves.

Linear scaling from Mamba layers could matter most for extended agent runs

Replacing most attention with Mamba is presented as the route to handling million-token contexts without quadratic blow-up, which fits the emphasis on long-running agents rather than short chat turns. Whether that architectural choice holds accuracy across diverse tasks is one of the open questions the open weights now let others test directly.

Sentiment

Positive users praise NVIDIA's Nemotron 3 Ultra hybrid MoE releases for their scale, speed, and open availability, while negative users dismiss the models as poor quality or impractical due to size and performance issues.

Pos
79.6%
Neg
20.4%
249 comments with sentiment.
Cluster Engagement
-
Views
-
Comments
-
Reposts
-
Bookmarks
Expand data
Posts from X
Most Activity
Most Activity
VIEWS1.1MBOOKMARKS1.2KLIKES3.3KRETWEETS437REPLIES170
NVIDIA AI@NVIDIAAI

Today we're shipping Nemotron 3 Ultra.

A 550B MoE frontier-intelligence open model built for long-running agents.

It delivers 5x faster inference and lowers the cost of complex agentic tasks by up to 30% versus other open frontier models.

3dViews 1.1MLikes 3.3KBookmarks 1.2K
NVIDIA@nvidia

Introducing NVIDIA Nemotron 3 Ultra.

A frontier smart open model built for long-running agents that need to plan, reason, use tools and keep working across complex coding, research and enterprise workflows.

Up to 5x faster inference and up to 30% lower cost for agentic tasks.

Learn more: https://nvda.ws/4x9nGps

3dViews 193.1KLikes 2.1KBookmarks 422
wh@nrehiew_

For the visual learners

wh@nrehiew_

This paper prompted me to do a review of NVFP4 pre-training, given that NVIDIA seems to be pushing support for it especially on Blackwells.

Much of the content will come from "Pretraining Large Language Models with NVFP4" and the Nemotron 3 Super paper 🧵

3dViews 33.6KLikes 541Bookmarks 562
Nathan Lambert@natolambert

We have another 65 page frontier model report from Nvidia to read @eliebakouch @stochasticchasm and gang

3dViews 44.1KLikes 611Bookmarks 342
kache@yacineMTB

open weights and open *data* thank you for everything

NVIDIA AI@NVIDIAAI

Today we're shipping Nemotron 3 Ultra.

A 550B MoE frontier-intelligence open model built for long-running agents.

It delivers 5x faster inference and lowers the cost of complex agentic tasks by up to 30% versus other open frontier models.

3dViews 70.5KLikes 1KBookmarks 204
Pavlo Molchanov@PavloMolchanov

Nemotron 3 Ultra (550B-A55B) is here - our strongest open-weight model and full training recipe to date.

Heavy emphasis on real-world inference efficiency for long-context agentic workloads.

Everything is open 🤗: base, post-trained, reward checkpoints, NVFP4 quantized versions, training data, and recipes.

Key technical highlights ‼️: - 550B total / 55B active parameters - Hybrid Mamba2-Transformer (~4:1 Mamba:Attention) - Pretrained in NVFP4 on 20T tokens - LatentMoE architecture - Two-stage MOPD post-training - Native MTP

Technical details in the thread 👇

3dViews 31.8KLikes 426Bookmarks 119
Unsloth AI@UnslothAI

You can now run NVIDIA Nemotron 3 Ultra, a new 550B open model.

Nemotron-3-Ultra-550B-A55B is NVIDIA's largest LLM yet, with 1M context, frontier coding & chat.

Run 2-bit on 200GB RAM, 3-bit on 256GB, 8-bit on 600GB.

GGUF: https://huggingface.co/unsloth/NVIDIA-Nemotron-3-Ultra-550B-A55B-GGUF Guide: https://unsloth.ai/docs/models/nemotron-3-ultra

NVIDIA AI@NVIDIAAI

Today we're shipping Nemotron 3 Ultra.

A 550B MoE frontier-intelligence open model built for long-running agents.

It delivers 5x faster inference and lowers the cost of complex agentic tasks by up to 30% versus other open frontier models.

3dViews 31.9KLikes 355Bookmarks 97

Very excited to share Nemotron 3 Ultra, 550B total with 55B active MoE hybrid Mamba-Attention model post-trained for agentic applications! This model delivers frontier-level agentic accuracy while being fast and open with weights, training software, and data available for commercial use. 1/4

3dViews 22.7KLikes 356Bookmarks 97
NVIDIA AI@NVIDIAAI

@openclaw @NousResearch @LangChain As always, Nemotron 3 Ultra is fully open.

This includes model weights, synthetic data, and post-training recipes. Available now on @huggingface → https://nvda.ws/4v1iBhi

3dViews 23.1KLikes 274Bookmarks 108
stochasm@stochasticchasm

alright, let's do this one more time

Nathan Lambert@natolambert

We have another 65 page frontier model report from Nvidia to read @eliebakouch @stochasticchasm and gang

3dViews 18.7KLikes 200Bookmarks 82
Chris 🇨🇦@llm_wizard

NEMOTRON 3 ULTRA IS LIVE. OUR BEST MODEL YET. PUNCHING IN THE SAME BALLPARK AS THE OPEN FRONTIER BAYBEEEEE.

RECIPES? CHECK. COOKBOOKS? CHECK. TECH REPORT? CHECK. DATA? CHECK. ENVS? CHECK.

3dViews 15.3KLikes 271Bookmarks 52
wh@nrehiew_

This paper prompted me to do a review of NVFP4 pre-training, given that NVIDIA seems to be pushing support for it especially on Blackwells.

Much of the content will come from "Pretraining Large Language Models with NVFP4" and the Nemotron 3 Super paper 🧵

wh@nrehiew_

Nemotron 3 Ultra is NVIDIA's best model yet and comes with a really great tech report. It focuses mainly on the NVFP4 recipe, and there is a ton of detailed work that went into their Multi-teacher On-Policy Distillation (MOPD) pipeline.

A thread of my notes.

3dViews 35KLikes 75Bookmarks 74
Chubby♨️@kimmonismus

1/ NVIDIA shipped Nemotron 3 Ultra today, a fully open 550B model with 55B active params, with the weights, training data, and complete recipe all released openly. That alone is rare at this scale.

The headline however actually is speed. Ultra is a hybrid Mamba-Attention MoE, an architecture built for fast decoding and a light memory footprint over long contexts, and NVIDIA clocks it at roughly 6x (!) the throughput of comparable open models on long-output agent workloads while holding the same accuracy.

That's a serious engineering result, and it's aimed exactly where the industry is heading: autonomous agents that run long, multi-turn tasks where throughput per GPU is what actually costs money.

It was pre-trained in 4-bit (NVFP4) across 20T tokens, the largest stable run of its kind shown to date. And the post-training introduces MOPD, where ten-plus specialist teacher models distill their skills into the student on its own rollouts, sometimes pushing it past the teachers themselves.

The interesting aspect:This is a frontier-class model you can fully reproduce.

NVIDIA AI@NVIDIAAI

Today we're shipping Nemotron 3 Ultra.

A 550B MoE frontier-intelligence open model built for long-running agents.

It delivers 5x faster inference and lowers the cost of complex agentic tasks by up to 30% versus other open frontier models.

3dViews 15.8KLikes 216Bookmarks 43
wh@nrehiew_

Nemotron 3 Ultra is NVIDIA's best model yet and comes with a really great tech report. It focuses mainly on the NVFP4 recipe, and there is a ton of detailed work that went into their Multi-teacher On-Policy Distillation (MOPD) pipeline.

A thread of my notes.

NVIDIA AI@NVIDIAAI

Today we're shipping Nemotron 3 Ultra.

A 550B MoE frontier-intelligence open model built for long-running agents.

It delivers 5x faster inference and lowers the cost of complex agentic tasks by up to 30% versus other open frontier models.

3dViews 10.8KLikes 93Bookmarks 60
Prime Intellect@PrimeIntellect

NVIDIA Nemotron 3 Ultra is here

We have Day‑0 support for Nemotron 3 Ultra in prime-rl and Lab.

Specialize Nemotron 3 Ultra for your use case.

https://www.primeintellect.ai/blog/nemotron-3

3dViews 17.3KLikes 198Bookmarks 34
vLLM@vllm_project

🚀 Day-0 support for NVIDIA Nemotron 3 Ultra on vLLM!

Ready to be served with the latest vLLM stable release, the new open frontier reasoning model is built for long-running autonomous agents: 🧠 550B total / 55B active — Hybrid Transformer-Mamba MoE 📚 Up to 1M token context ⚡ NVFP4 + BF16 🛠️ Tool calling, coding, deep research, orchestration

Read our detailed model launch blog and recipes! https://recipes.vllm.ai/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16

NVIDIA AI@NVIDIAAI

Today we're shipping Nemotron 3 Ultra.

A 550B MoE frontier-intelligence open model built for long-running agents.

It delivers 5x faster inference and lowers the cost of complex agentic tasks by up to 30% versus other open frontier models.

3dViews 15.7KLikes 150Bookmarks 32
merve@mervenoyann

NVIDIA Nemotron Ultra is here 😍

> 55B/550B a hybrid MoE  🦖 with 1M context window > supports MTP speculative decoding 💨 > day-0 supported in transformers

sits in the most attractive quadrant per performance/efficiency in AA Index 🔥

3dViews 14.8KLikes 151Bookmarks 30
kwindla@kwindla

http://x.com/i/article/2062636698337726464

3dViews 4.5KLikes 53Bookmarks 36
Fireworks AI@FireworksAI_HQ

NVIDIA Nemotron 3 Ultra is on Fireworks, day zero.

Nemotron Ultra is an open model for frontier reasoning and orchestration in long-running autonomous agents.

Think use cases like coding agents, deep research, and complex enterprise workflows.

Read on: https://fireworks.ai/blog/nemotron-3-ultra

3dViews 5.5KLikes 114Bookmarks 23
Modal@modal

With today's launch of Nemotron 3 Ultra, @nvidia continues to expand its investment in open-source AI. Their flagship frontier-reasoning model, built for long-running autonomous agents, is available Day 0 on Modal.

- 550B with 55B active parameters - Hybrid Transformer-Mamba MoE architecture - 1M context - Up to 5x faster inference - Up to 30% lower cost

NVIDIA@nvidia

Introducing NVIDIA Nemotron 3 Ultra.

A frontier smart open model built for long-running agents that need to plan, reason, use tools and keep working across complex coding, research and enterprise workflows.

Up to 5x faster inference and up to 30% lower cost for agentic tasks.

Learn more: https://nvda.ws/4x9nGps

3dViews 12KLikes 128Bookmarks 16
Load more posts