How to Get Started
Step 1
Sign Up and get your Api Key and use with UNLIMITED tokens for 30 days.
Step 2
Paste the URL from Huggingface repository. (Ex: https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4)
Step 3
Choose the GPU machine to deploy.
That’s all! You’re ready to use the model in few minutes without infrastructure complexity in few minutes.

Applications & Use Cases
- Multimodal chat assistants that combine strong language, coding, and visual reasoning for customer support, analytics, and internal copilots.
- Long‑context RAG systems over technical documentation, codebases, and mixed text‑image corpora, exploiting the 262k–1M context window for cross‑document reasoning.
- Coding and agentic copilots that use the 3B‑active MoE efficiency plus tool calling to drive complex software, data, and operations workflows.
- Evaluation and teacher models for distillation, where Qwen3.5‑35B‑A3B’s high MMLU‑Pro, GPQA Diamond, and SWE‑bench scores provide a strong open reference.
- Single‑GPU or cost‑sensitive deployments (for example 8–16 GB with quantization) that still target near‑frontier performance in multilingual, multimodal application