Skip to content
Regolo Logo
Featured

DeepSeek‑OCR‑2 is a 3B‑parameter, Apache‑2.0 vision–language model with DeepEncoder V2, delivering SOTA document OCR and layout understanding using up to 20× fewer tokens and supporting industrial‑scale PDF ingestion.

Provider DeepSeek
Release Date January 27, 2026
Total params 3B
Architecture vision–language OCR
License Apache 2.0
Core Model OCR Vision

Core Models

Use in 2 minutes, no cold boots, low latency and free for 30 days

16 models available

You can use Qwen‑Image‑2512 as a high‑quality image generator for production apps, design tools, and creative workflows that need photorealistic people, complex scenes, and reliable typography in English and Chinese.…

Chat Vision
View model

Apertus‑70B‑2509 is a 70B-parameter, fully open multilingual transformer from the Swiss AI Initiative, trained on 15T compliant tokens and supporting 1,800+ languages with competitive open‑weight benchmark performance.

Chat
View model

gpt-oss-20b is a 21B-parameter open-weight MoE reasoning model from OpenAI with ~4B active parameters, a 128k context window, and native support for chain-of-thought, tools, and structured outputs under Apache 2.0.

Chat Reasoning
View model

Llama-3.1-8B-Instruct is an 8B-parameter multilingual chat and instruction-following model from Meta with a 128k context window, strong tool usage, and efficient performance for real-time assistants.

Chat
View model

Mistral-Small-4-119B-2603 is a 119B-parameter multimodal MoE model with only 6.5B active parameters per token, delivering top-tier reasoning, coding, and vision performance with a 256k-token context window.

Chat Reasoning Vision
View model

Mistral-Small-3.2-24B-Instruct-2506 is a 24B-parameter instruction-tuned model from Mistral that improves instruction following, reduces repetition, and offers robust function calling for production-grade assistants.

Chat Tools Vision
View model

Llama 3.3 70B Instruct is Meta’s multilingual, instruction-tuned 70B text model for chat, coding, reasoning, and tool-enabled assistants. It supports 128K context, eight officially supported languages, and commercial use under…

Chat
View model

Qwen3.5-9B is a 9B-parameter, open-weight multimodal foundation model from Alibaba Cloud that delivers strong reasoning, coding, and vision-language performance with a 262K-token native context window.

Chat Reasoning
View model

Qwen3-Coder-Next is an open-weight coding model built for coding agents and local development, combining 80B total parameters with just 3B active parameters for efficient deployment. It supports 256K context and…

Chat Reasoning Tools
View model

The model can be used with sentence-transformers or Hugging Face Transformers, with both integration paths documented on the official model card.

Embedding Tools Vision
View model

Qwen3-Reranker-4B is a 4B text reranking model built to improve retrieval precision in multilingual and code search workflows. It supports 32K context, 100+ languages, and instruction-aware ranking, making it well…

Rerank
View model

supports seamless switching between a “thinking mode” for complex math/code/logic and a “non‑thinking mode” for efficient dialogue

Chat Tools Video Understanding Vision
View model

gpt-oss-120b is OpenAI’s flagship open-weight Mixture-of-Experts language model with about 117B parameters and 5.1B active per token, optimized for high‑reasoning, agentic production workloads on a single 80GB GPU and released…

Chat Reasoning Tools
View model

Qwen3.5-122B-A10B is a powerful open-weight Mixture-of-Experts (MoE) model from Alibaba's Qwen team, featuring 122 billion total parameters with only 10 billion active per token for efficient performance.

Chat Reasoning Tools Video Understanding Vision
View model

Custom Models

Choose from Hugging Face any models compatible vLLM and deploy with our GPIUs

Holo3‑35B‑A3B is a 35B (3B active) open‑weight multimodal MoE model from H Company, optimized for computer‑use agents that read screens, understand UIs, and plan reliable multi‑step actions with a 64k…

Chat Reasoning Vision
View model

chandra‑ocr‑2 is a 4B‑parameter, layout‑aware OCR model from Datalab that converts complex documents into structured Markdown/HTML/JSON across 90+ languages, achieving SOTA olmOCR scores with 2× the throughput of Chandra 1.

OCR Vision
View model

Z‑Image‑Turbo is a 6B‑parameter, ultra‑fast text‑to‑image model from Tongyi‑MAI that generates photorealistic, bilingual images in under a second using just eight diffusion steps, even on 16 GB consumer GPUs.

Image
View model

Qwen3.5‑35B‑A3B is a 35B (3B active) sparse‑MoE multimodal model from Alibaba that offers 262k–1M context, strong reasoning and vision performance, and Apache‑2.0 open weights tuned for efficient single‑GPU deployment.

Chat Reasoning Vision
View model

GLM‑OCR is a 0.9B‑parameter multimodal OCR model that uses a CogViT vision encoder and GLM decoder with Multi‑Token Prediction to deliver state‑of‑the‑art document parsing accuracy while remaining small enough for…

Chat Vision
View model

Nemotron‑Cascade‑2‑30B‑A3B is a 30B (3B active) hybrid Mamba–Transformer MoE model from NVIDIA with 262k+ context and gold‑medal IMO/IOI 2025 performance, tuned for dense reasoning and agentic coding on a single…

Chat Reasoning
View model

NVIDIA Nemotron‑3‑Super‑120B‑A12B‑NVFP4 is a 120B (12B active) LatentMoE hybrid Mamba‑Transformer model with up to 1M‑token context, NVFP4 efficiency, and state‑of‑the‑art performance on agentic reasoning, coding, and long‑horizon planning tasks.

Chat Reasoning Vision
View model