Models Archive - Page 2 of 2

Featured

DeepSeek-OCR-2

DeepSeek‑OCR‑2 is a 3B‑parameter, Apache‑2.0 vision–language model with DeepEncoder V2, delivering SOTA document OCR and layout understanding using up to 20× fewer tokens and supporting industrial‑scale PDF ingestion.

Provider DeepSeek

Release Date January 27, 2026

Total params 3B

Architecture vision–language OCR

License Apache 2.0

Core Model OCR Vision

View model details

Core Models

Use in 2 minutes, no cold boots, low latency and free for 30 days

16 models available

Qwen-Image-2512

You can use Qwen‑Image‑2512 as a high‑quality image generator for production apps, design tools, and creative workflows that need photorealistic people, complex scenes, and reliable typography in English and Chinese.…

Chat Vision

View model

Apertus-70B-2509

Apertus‑70B‑2509 is a 70B-parameter, fully open multilingual transformer from the Swiss AI Initiative, trained on 15T compliant tokens and supporting 1,800+ languages with competitive open‑weight benchmark performance.

Chat

View model

GPT OSS 20b

gpt-oss-20b is a 21B-parameter open-weight MoE reasoning model from OpenAI with ~4B active parameters, a 128k context window, and native support for chain-of-thought, tools, and structured outputs under Apache 2.0.

Chat Reasoning

View model

Llama 3.1 8B Instruct

Llama-3.1-8B-Instruct is an 8B-parameter multilingual chat and instruction-following model from Meta with a 128k context window, strong tool usage, and efficient performance for real-time assistants.

Chat

View model

Mistral Small 4 119b

Mistral-Small-4-119B-2603 is a 119B-parameter multimodal MoE model with only 6.5B active parameters per token, delivering top-tier reasoning, coding, and vision performance with a 256k-token context window.

Chat Reasoning Vision

View model

Mistral-Small-3.2-24B-Instruct-2506

Mistral-Small-3.2-24B-Instruct-2506 is a 24B-parameter instruction-tuned model from Mistral that improves instruction following, reduces repetition, and offers robust function calling for production-grade assistants.

Chat Tools Vision

View model

Llama 3.3 70B Instruct

Llama 3.3 70B Instruct is Meta’s multilingual, instruction-tuned 70B text model for chat, coding, reasoning, and tool-enabled assistants. It supports 128K context, eight officially supported languages, and commercial use under…

Chat

View model

Qwen3.5-9B

Qwen3.5-9B is a 9B-parameter, open-weight multimodal foundation model from Alibaba Cloud that delivers strong reasoning, coding, and vision-language performance with a 262K-token native context window.

Chat Reasoning

View model

Qwen3-Coder-Next

Qwen3-Coder-Next is an open-weight coding model built for coding agents and local development, combining 80B total parameters with just 3B active parameters for efficient deployment. It supports 256K context and…

Chat Reasoning Tools

View model

Qwen3-Embedding-8B

The model can be used with sentence-transformers or Hugging Face Transformers, with both integration paths documented on the official model card.

Embedding Tools Vision

View model

Qwen3-Reranker-4B

Qwen3-Reranker-4B is a 4B text reranking model built to improve retrieval precision in multilingual and code search workflows. It supports 32K context, 100+ languages, and instruction-aware ranking, making it well…

Rerank

View model

Qwen3-VL-32B-Instruct

supports seamless switching between a “thinking mode” for complex math/code/logic and a “non‑thinking mode” for efficient dialogue

Chat Tools Video Understanding Vision

View model

GPT OSS 120B

gpt-oss-120b is OpenAI’s flagship open-weight Mixture-of-Experts language model with about 117B parameters and 5.1B active per token, optimized for high‑reasoning, agentic production workloads on a single 80GB GPU and released…

Chat Reasoning Tools

View model

Qwen3.5-122b

Qwen3.5-122B-A10B is a powerful open-weight Mixture-of-Experts (MoE) model from Alibaba's Qwen team, featuring 122 billion total parameters with only 10 billion active per token for efficient performance.

Chat Reasoning Tools Video Understanding Vision

View model

Custom Models

Choose from Hugging Face any models compatible vLLM and deploy with our GPIUs

Holo3‑35B‑A3B

Holo3‑35B‑A3B is a 35B (3B active) open‑weight multimodal MoE model from H Company, optimized for computer‑use agents that read screens, understand UIs, and plan reliable multi‑step actions with a 64k…

Chat Reasoning Vision

View model

Chandra OCR 2

chandra‑ocr‑2 is a 4B‑parameter, layout‑aware OCR model from Datalab that converts complex documents into structured Markdown/HTML/JSON across 90+ languages, achieving SOTA olmOCR scores with 2× the throughput of Chandra 1.

OCR Vision

View model

Z Image Turbo

Z‑Image‑Turbo is a 6B‑parameter, ultra‑fast text‑to‑image model from Tongyi‑MAI that generates photorealistic, bilingual images in under a second using just eight diffusion steps, even on 16 GB consumer GPUs.

Image

View model

Qwen3.5 35B A3B

Qwen3.5‑35B‑A3B is a 35B (3B active) sparse‑MoE multimodal model from Alibaba that offers 262k–1M context, strong reasoning and vision performance, and Apache‑2.0 open weights tuned for efficient single‑GPU deployment.

Chat Reasoning Vision

View model

GLM OCR

GLM‑OCR is a 0.9B‑parameter multimodal OCR model that uses a CogViT vision encoder and GLM decoder with Multi‑Token Prediction to deliver state‑of‑the‑art document parsing accuracy while remaining small enough for…

Chat Vision

View model

Nemotron Cascade 2 30B-A3B

Nemotron‑Cascade‑2‑30B‑A3B is a 30B (3B active) hybrid Mamba–Transformer MoE model from NVIDIA with 262k+ context and gold‑medal IMO/IOI 2025 performance, tuned for dense reasoning and agentic coding on a single…

Chat Reasoning

View model

NVIDIA Nemotron 3 Super 120B-A12B-NVFP4

NVIDIA Nemotron‑3‑Super‑120B‑A12B‑NVFP4 is a 120B (12B active) LatentMoE hybrid Mamba‑Transformer model with up to 1M‑token context, NVFP4 efficiency, and state‑of‑the‑art performance on agentic reasoning, coding, and long‑horizon planning tasks.

Chat Reasoning Vision

View model