You can use Qwen‑Image‑2512 as a high‑quality image generator for production apps, design tools, and creative workflows that need photorealistic people, complex scenes, and reliable typography in English and Chinese.…
Models
You can begin working with any model using a single line of code. For more advanced tasks, you can fine-tune existing models or run your own custom code.
DeepSeek‑OCR‑2 is a 3B‑parameter, Apache‑2.0 vision–language model with DeepEncoder V2, delivering SOTA document OCR and layout understanding using up to 20× fewer tokens and supporting industrial‑scale PDF ingestion.
Core Models
Use in 2 minutes, no cold boots, low latency and free for 30 days
Apertus‑70B‑2509 is a 70B-parameter, fully open multilingual transformer from the Swiss AI Initiative, trained on 15T compliant tokens and supporting 1,800+ languages with competitive open‑weight benchmark performance.
gpt-oss-20b is a 21B-parameter open-weight MoE reasoning model from OpenAI with ~4B active parameters, a 128k context window, and native support for chain-of-thought, tools, and structured outputs under Apache 2.0.
Llama-3.1-8B-Instruct is an 8B-parameter multilingual chat and instruction-following model from Meta with a 128k context window, strong tool usage, and efficient performance for real-time assistants.
Mistral-Small-4-119B-2603 is a 119B-parameter multimodal MoE model with only 6.5B active parameters per token, delivering top-tier reasoning, coding, and vision performance with a 256k-token context window.
Mistral-Small-3.2-24B-Instruct-2506 is a 24B-parameter instruction-tuned model from Mistral that improves instruction following, reduces repetition, and offers robust function calling for production-grade assistants.
Llama 3.3 70B Instruct is Meta’s multilingual, instruction-tuned 70B text model for chat, coding, reasoning, and tool-enabled assistants. It supports 128K context, eight officially supported languages, and commercial use under…
Qwen3.5-9B is a 9B-parameter, open-weight multimodal foundation model from Alibaba Cloud that delivers strong reasoning, coding, and vision-language performance with a 262K-token native context window.
Qwen3-Coder-Next is an open-weight coding model built for coding agents and local development, combining 80B total parameters with just 3B active parameters for efficient deployment. It supports 256K context and…
The model can be used with sentence-transformers or Hugging Face Transformers, with both integration paths documented on the official model card.
Qwen3-Reranker-4B is a 4B text reranking model built to improve retrieval precision in multilingual and code search workflows. It supports 32K context, 100+ languages, and instruction-aware ranking, making it well…
supports seamless switching between a “thinking mode” for complex math/code/logic and a “non‑thinking mode” for efficient dialogue
gpt-oss-120b is OpenAI’s flagship open-weight Mixture-of-Experts language model with about 117B parameters and 5.1B active per token, optimized for high‑reasoning, agentic production workloads on a single 80GB GPU and released…
Qwen3.5-122B-A10B is a powerful open-weight Mixture-of-Experts (MoE) model from Alibaba's Qwen team, featuring 122 billion total parameters with only 10 billion active per token for efficient performance.
Custom Models
Choose from Hugging Face any models compatible vLLM and deploy with our GPIUs
Holo3‑35B‑A3B is a 35B (3B active) open‑weight multimodal MoE model from H Company, optimized for computer‑use agents that read screens, understand UIs, and plan reliable multi‑step actions with a 64k…
chandra‑ocr‑2 is a 4B‑parameter, layout‑aware OCR model from Datalab that converts complex documents into structured Markdown/HTML/JSON across 90+ languages, achieving SOTA olmOCR scores with 2× the throughput of Chandra 1.
Z‑Image‑Turbo is a 6B‑parameter, ultra‑fast text‑to‑image model from Tongyi‑MAI that generates photorealistic, bilingual images in under a second using just eight diffusion steps, even on 16 GB consumer GPUs.
Qwen3.5‑35B‑A3B is a 35B (3B active) sparse‑MoE multimodal model from Alibaba that offers 262k–1M context, strong reasoning and vision performance, and Apache‑2.0 open weights tuned for efficient single‑GPU deployment.
GLM‑OCR is a 0.9B‑parameter multimodal OCR model that uses a CogViT vision encoder and GLM decoder with Multi‑Token Prediction to deliver state‑of‑the‑art document parsing accuracy while remaining small enough for…
Nemotron‑Cascade‑2‑30B‑A3B is a 30B (3B active) hybrid Mamba–Transformer MoE model from NVIDIA with 262k+ context and gold‑medal IMO/IOI 2025 performance, tuned for dense reasoning and agentic coding on a single…
NVIDIA Nemotron‑3‑Super‑120B‑A12B‑NVFP4 is a 120B (12B active) LatentMoE hybrid Mamba‑Transformer model with up to 1M‑token context, NVFP4 efficiency, and state‑of‑the‑art performance on agentic reasoning, coding, and long‑horizon planning tasks.