/AI3d ago

NVIDIA releases Nemotron 3 Ultra, a 550B parameter open-weight hybrid Mamba2-Transformer MoE model for agentic workloads

AI Judge changed title after evaluation, original title: "NVIDIA releases Nemotron 3 Ultra, a 550-billion parameter open-weight hybrid Mamba2-Transformer MoE model"

Story Overview

NVIDIA has put out Nemotron 3 Ultra, a 550B-parameter open-weight model with only 55B active parameters that mixes Mamba2 layers and Transformer attention inside a Mixture-of-Experts setup. The release targets long-running agent workflows in coding, research, and enterprise settings, with support for up to 1M context and deployment on-premise, in the cloud, or at the edge. Weights and training details are available now under the OpenMDW 1.1 license on Hugging Face.

--0--

#28

Original post

Bryan Catanzaro@ctnzr#434inAI

During the past 6 months, Nemotron has grown from 24 to 48 on the AAI, and we're just getting started.

Bryan Catanzaro@ctnzr

NVIDIA Nemotron 3 Ultra is now live!

Frontier accuracy, 5X greater speed, 30% lower cost.

Deploy however you need - on-premise, on the cloud, or at the edge.

Model is live on HuggingFace under the OpenMDW 1.1 license.

https://www.youtube.com/watch?v=D8LIIvQVGS4

5:42 AM · Jun 4, 2026 · 2.5K Views

/AI3d ago

NVIDIA releases Nemotron 3 Ultra, a 550B parameter open-weight hybrid Mamba2-Transformer MoE model for agentic workloads

AI Judge changed title after evaluation, original title: "NVIDIA releases Nemotron 3 Ultra, a 550-billion parameter open-weight hybrid Mamba2-Transformer MoE model"

Story Overview

--0--

#28

Original post

Bryan Catanzaro@ctnzr#434inAI

During the past 6 months, Nemotron has grown from 24 to 48 on the AAI, and we're just getting started.

Bryan Catanzaro@ctnzr

NVIDIA Nemotron 3 Ultra is now live!

Frontier accuracy, 5X greater speed, 30% lower cost.

Deploy however you need - on-premise, on the cloud, or at the edge.

Model is live on HuggingFace under the OpenMDW 1.1 license.

https://www.youtube.com/watch?v=D8LIIvQVGS4

5:42 AM · Jun 4, 2026 · 2.5K Views

Speed and cost numbers rest on NVIDIA's own agent benchmarks

The company states up to 5x higher inference throughput and 30 percent lower cost per task than other open frontier models, backed by charts comparing it on SWE-Bench, Terminal-Bench, and similar suites. Independent confirmation is still absent, so the practical gains for any specific workload remain to be measured by users running the model themselves.

Linear scaling from Mamba layers could matter most for extended agent runs

Replacing most attention with Mamba is presented as the route to handling million-token contexts without quadratic blow-up, which fits the emphasis on long-running agents rather than short chat turns. Whether that architectural choice holds accuracy across diverse tasks is one of the open questions the open weights now let others test directly.

Sentiment

Positive users praise NVIDIA's Nemotron 3 Ultra hybrid MoE releases for their scale, speed, and open availability, while negative users dismiss the models as poor quality or impractical due to size and performance issues.

Pos

79.6%

Neg

20.4%

249 comments with sentiment.

Cluster Engagement

Views

Comments

Reposts

Bookmarks

Expand data

Posts from X

Most Activity

VIEWS1.1MBOOKMARKS1.2KLIKES3.3KRETWEETS437REPLIES170

NVIDIA AI@NVIDIAAI

Today we're shipping Nemotron 3 Ultra.

A 550B MoE frontier-intelligence open model built for long-running agents.

It delivers 5x faster inference and lowers the cost of complex agentic tasks by up to 30% versus other open frontier models.

3d1.1M3.3K1.2K

NVIDIA@nvidia

Introducing NVIDIA Nemotron 3 Ultra.

A frontier smart open model built for long-running agents that need to plan, reason, use tools and keep working across complex coding, research and enterprise workflows.

Up to 5x faster inference and up to 30% lower cost for agentic tasks.

Learn more: https://nvda.ws/4x9nGps

3d193.1K2.1K422

wh@nrehiew_

For the visual learners

wh@nrehiew_

This paper prompted me to do a review of NVFP4 pre-training, given that NVIDIA seems to be pushing support for it especially on Blackwells.

Much of the content will come from "Pretraining Large Language Models with NVFP4" and the Nemotron 3 Super paper 🧵

3d33.6K541562

Nathan Lambert@natolambert

We have another 65 page frontier model report from Nvidia to read @eliebakouch @stochasticchasm and gang

3d44.1K611342

kache@yacineMTB

open weights and open *data* thank you for everything

NVIDIA AI@NVIDIAAI

Today we're shipping Nemotron 3 Ultra.

A 550B MoE frontier-intelligence open model built for long-running agents.

It delivers 5x faster inference and lowers the cost of complex agentic tasks by up to 30% versus other open frontier models.

3d70.5K1K204

Pavlo Molchanov@PavloMolchanov

Nemotron 3 Ultra (550B-A55B) is here - our strongest open-weight model and full training recipe to date.

Heavy emphasis on real-world inference efficiency for long-context agentic workloads.

Everything is open 🤗: base, post-trained, reward checkpoints, NVFP4 quantized versions, training data, and recipes.

Key technical highlights ‼️: - 550B total / 55B active parameters - Hybrid Mamba2-Transformer (~4:1 Mamba:Attention) - Pretrained in NVFP4 on 20T tokens - LatentMoE architecture - Two-stage MOPD post-training - Native MTP

Technical details in the thread 👇

3d31.8K426119

Unsloth AI@UnslothAI

You can now run NVIDIA Nemotron 3 Ultra, a new 550B open model.

Nemotron-3-Ultra-550B-A55B is NVIDIA's largest LLM yet, with 1M context, frontier coding & chat.

Run 2-bit on 200GB RAM, 3-bit on 256GB, 8-bit on 600GB.

GGUF: https://huggingface.co/unsloth/NVIDIA-Nemotron-3-Ultra-550B-A55B-GGUF Guide: https://unsloth.ai/docs/models/nemotron-3-ultra

NVIDIA AI@NVIDIAAI

Today we're shipping Nemotron 3 Ultra.

A 550B MoE frontier-intelligence open model built for long-running agents.

It delivers 5x faster inference and lowers the cost of complex agentic tasks by up to 30% versus other open frontier models.

3d31.9K35597

Oleksii Kuchaiev@kuchaev

Very excited to share Nemotron 3 Ultra, 550B total with 55B active MoE hybrid Mamba-Attention model post-trained for agentic applications! This model delivers frontier-level agentic accuracy while being fast and open with weights, training software, and data available for commercial use. 1/4

3d22.7K35697

NVIDIA AI@NVIDIAAI

@openclaw @NousResearch @LangChain As always, Nemotron 3 Ultra is fully open.

This includes model weights, synthetic data, and post-training recipes. Available now on @huggingface → https://nvda.ws/4v1iBhi

3d23.1K274108

stochasm@stochasticchasm

alright, let's do this one more time

Nathan Lambert@natolambert

We have another 65 page frontier model report from Nvidia to read @eliebakouch @stochasticchasm and gang

3d18.7K20082

Chris 🇨🇦@llm_wizard

NEMOTRON 3 ULTRA IS LIVE. OUR BEST MODEL YET. PUNCHING IN THE SAME BALLPARK AS THE OPEN FRONTIER BAYBEEEEE.

RECIPES? CHECK. COOKBOOKS? CHECK. TECH REPORT? CHECK. DATA? CHECK. ENVS? CHECK.

3d15.3K27152

wh@nrehiew_

This paper prompted me to do a review of NVFP4 pre-training, given that NVIDIA seems to be pushing support for it especially on Blackwells.

Much of the content will come from "Pretraining Large Language Models with NVFP4" and the Nemotron 3 Super paper 🧵

wh@nrehiew_

Nemotron 3 Ultra is NVIDIA's best model yet and comes with a really great tech report. It focuses mainly on the NVFP4 recipe, and there is a ton of detailed work that went into their Multi-teacher On-Policy Distillation (MOPD) pipeline.

A thread of my notes.

3d35K7574

Chubby♨️@kimmonismus

1/ NVIDIA shipped Nemotron 3 Ultra today, a fully open 550B model with 55B active params, with the weights, training data, and complete recipe all released openly. That alone is rare at this scale.

The headline however actually is speed. Ultra is a hybrid Mamba-Attention MoE, an architecture built for fast decoding and a light memory footprint over long contexts, and NVIDIA clocks it at roughly 6x (!) the throughput of comparable open models on long-output agent workloads while holding the same accuracy.

That's a serious engineering result, and it's aimed exactly where the industry is heading: autonomous agents that run long, multi-turn tasks where throughput per GPU is what actually costs money.

It was pre-trained in 4-bit (NVFP4) across 20T tokens, the largest stable run of its kind shown to date. And the post-training introduces MOPD, where ten-plus specialist teacher models distill their skills into the student on its own rollouts, sometimes pushing it past the teachers themselves.

The interesting aspect:This is a frontier-class model you can fully reproduce.

NVIDIA AI@NVIDIAAI

Today we're shipping Nemotron 3 Ultra.

A 550B MoE frontier-intelligence open model built for long-running agents.

It delivers 5x faster inference and lowers the cost of complex agentic tasks by up to 30% versus other open frontier models.

3d15.8K21643

wh@nrehiew_

A thread of my notes.

NVIDIA AI@NVIDIAAI

Today we're shipping Nemotron 3 Ultra.

A 550B MoE frontier-intelligence open model built for long-running agents.

It delivers 5x faster inference and lowers the cost of complex agentic tasks by up to 30% versus other open frontier models.

3d10.8K9360

Prime Intellect@PrimeIntellect

NVIDIA Nemotron 3 Ultra is here

We have Day‑0 support for Nemotron 3 Ultra in prime-rl and Lab.

Specialize Nemotron 3 Ultra for your use case.

https://www.primeintellect.ai/blog/nemotron-3

3d17.3K19834

vLLM@vllm_project

🚀 Day-0 support for NVIDIA Nemotron 3 Ultra on vLLM!

Ready to be served with the latest vLLM stable release, the new open frontier reasoning model is built for long-running autonomous agents: 🧠 550B total / 55B active — Hybrid Transformer-Mamba MoE 📚 Up to 1M token context ⚡ NVFP4 + BF16 🛠️ Tool calling, coding, deep research, orchestration

Read our detailed model launch blog and recipes! https://recipes.vllm.ai/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16

NVIDIA AI@NVIDIAAI

Today we're shipping Nemotron 3 Ultra.

A 550B MoE frontier-intelligence open model built for long-running agents.

It delivers 5x faster inference and lowers the cost of complex agentic tasks by up to 30% versus other open frontier models.

3d15.7K15032

merve@mervenoyann

NVIDIA Nemotron Ultra is here 😍

> 55B/550B a hybrid MoE 🦖 with 1M context window > supports MTP speculative decoding 💨 > day-0 supported in transformers

sits in the most attractive quadrant per performance/efficiency in AA Index 🔥

3d14.8K15130

kwindla@kwindla

http://x.com/i/article/2062636698337726464

3d4.5K5336

Fireworks AI@FireworksAI_HQ

NVIDIA Nemotron 3 Ultra is on Fireworks, day zero.

Nemotron Ultra is an open model for frontier reasoning and orchestration in long-running autonomous agents.

Think use cases like coding agents, deep research, and complex enterprise workflows.

Read on: https://fireworks.ai/blog/nemotron-3-ultra

3d5.5K11423

Modal@modal

With today's launch of Nemotron 3 Ultra, @nvidia continues to expand its investment in open-source AI. Their flagship frontier-reasoning model, built for long-running autonomous agents, is available Day 0 on Modal.

- 550B with 55B active parameters - Hybrid Transformer-Mamba MoE architecture - 1M context - Up to 5x faster inference - Up to 30% lower cost

NVIDIA@nvidia

Introducing NVIDIA Nemotron 3 Ultra.

A frontier smart open model built for long-running agents that need to plan, reason, use tools and keep working across complex coding, research and enterprise workflows.

Up to 5x faster inference and up to 30% lower cost for agentic tasks.

Learn more: https://nvda.ws/4x9nGps

3d12K12816