Mistral Small 4 119b

Mistral-Small-4-119B-2603 is a 119B-parameter multimodal MoE model with only 6.5B active parameters per token, delivering top-tier reasoning, coding, and vision performance with a 256k-token context window.

Core Model

Chat

You can use Mistral-Small-4-119B-2603 as a unified backbone for high-end assistants, reasoning agents, and multimodal applications that need strong performance but lower serving costs thanks to MoE routing. It supports text, vision, and tool-augmented workflows in a single model, making it suitable for complex enterprise copilots, agentic systems, and long-context document or video understanding pipelines.

How to get started

pip install requests

import requests


api_url = "https://api.regolo.ai/v1/chat/completions"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_REGOLO_KEY"
}
data = {
  "model": "mistral-small-4-119b",
  "messages": [
    {
      "role": "user",
      "content": "If a train travels 60 km/h for 2 hours and then 80 km/h for 1.5 hours, what is the total distance covered?"
    }
  ],
  "reasoning_effort": "high"
}

response = requests.post(api_url, headers=headers, json=data)
print(response.json())
Code language: JavaScript (javascript)

Applications & Use Cases

Multimodal chat assistants that combine text, images, and complex reasoning for customer support, analysis, and executive copilots.

Advanced coding and debugging copilots that handle large repositories, multi-file refactors, and long code reviews using the 256k-token context.

Enterprise research agents for legal, financial, and technical domains, performing deep retrieval-augmented analysis over very large document corpora.

Tool and function-calling orchestration layers that drive workflows such as CRM automation, incident response, and operational decision support.

Vision-language applications such as report OCR, UI understanding, product search, and image-grounded Q&A using the model’s unified multimodal encoder.

Long-horizon planning and reasoning agents for logistics, strategy simulations, and complex what-if analysis where the 256k context and MoE capacity matter.

Cost-efficient “near frontier” deployments where you want frontier-level quality but with only 6.5B active parameters per token, reducing GPU requirements compared with dense 100B+ models.