NVIDIA Nemotron 3 Super 120B-A12B-NVFP4

NVIDIA Nemotron‑3‑Super‑120B‑A12B‑NVFP4 is a 120B (12B active) LatentMoE hybrid Mamba‑Transformer model with up to 1M‑token context, NVFP4 efficiency, and state‑of‑the‑art performance on agentic reasoning, coding, and long‑horizon planning tasks.

Custom Model

Chat

How to Get Started

Step 1

Step 2

Paste the URL from Huggingface repository. (Ex: https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4)

Step 3

Choose the GPU machine to deploy.

That’s all! You’re ready to use the model in few minutes without infrastructure complexity in few minutes.

Applications & Use Cases

Agentic workflows for IT ticket triage, incident response, and operations automation across large enterprises.
Multi-agent reasoning systems that coordinate several specialized tools or agents for complex, multi-step tasks and planning.
Long-context RAG over millions of tokens of logs, documents, and knowledge bases using the up-to-1M-token window for cross-document reasoning.
Advanced coding and DevOps copilots that handle large monorepos, multi-file refactors, and infrastructure-as-code with explicit reasoning traces.
High-volume conversational assistants and helpdesk copilots where NVFP4 training and LatentMoE routing reduce serving cost without sacrificing quality.
Tool- and API-calling orchestrators for data pipelines, analytics dashboards, and workflow automation that rely on reliable function calling and planning.
Research and evaluation setups that benchmark or distill frontier-scale reasoning into smaller models using Nemotron 3 Super as an open reference teacher.