Qwen3.5 35B A3B

Qwen3.5‑35B‑A3B is a 35B (3B active) sparse‑MoE multimodal model from Alibaba that offers 262k–1M context, strong reasoning and vision performance, and Apache‑2.0 open weights tuned for efficient single‑GPU deployment.

Custom Model

Chat

How to Get Started

Step 1

Step 2

Paste the URL from Huggingface repository. (Ex: https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4)

Step 3

Choose the GPU machine to deploy.

That’s all! You’re ready to use the model in few minutes without infrastructure complexity in few minutes.

Applications & Use Cases

Multimodal chat assistants that combine strong language, coding, and visual reasoning for customer support, analytics, and internal copilots.
Long‑context RAG systems over technical documentation, codebases, and mixed text‑image corpora, exploiting the 262k–1M context window for cross‑document reasoning.
Coding and agentic copilots that use the 3B‑active MoE efficiency plus tool calling to drive complex software, data, and operations workflows.
Evaluation and teacher models for distillation, where Qwen3.5‑35B‑A3B’s high MMLU‑Pro, GPQA Diamond, and SWE‑bench scores provide a strong open reference.
Single‑GPU or cost‑sensitive deployments (for example 8–16 GB with quantization) that still target near‑frontier performance in multilingual, multimodal application