Qwen3.5 122B, Qwen3.5 9B, and Mistral Small 4 119B Are Now Available on Regolo

We have added three new open models to Regolo: Qwen3.5 122B, Qwen3.5 9B, and Mistral Small 4 119B. Together, they give us a stronger spread across high-end multimodal reasoning, efficient long-context inference, and production-ready coding and agent workflows.

This is not a teaser release. These models are already available, and the practical question is simple: which one should you choose for which workload, and how should you call it through Regolo in production. Regolo’s API supports model discovery through GET /models and chat inference through POST /v1/chat/completions with a Bearer token, so the integration path is straightforward.

All benchmark numbers below come from the published model cards and provider documentation, so we should treat them as strong directional evidence, not as a substitute for workload-specific evaluation on our own data.

Which model should we use for which job?

The fastest way to think about this release is to map each model to a primary operating mode. Qwen3.5 122B is the strongest choice when we want the highest open-weight multimodal reasoning quality and broad agent capability. Qwen3.5 9B is the efficient option when we care about cost, throughput, and still-surprisingly-strong multimodal performance. Mistral Small 4 is the best fit when we want a general-purpose production model for coding, agents, and reasoning with a stronger speed story.

Model	What it is best for	Benchmarks that matter
Qwen3.5 122B	High-end enterprise copilots, multimodal document analysis, long-context knowledge work, tool-using agents, and advanced research or policy workflows.	122B total parameters with 10B activated, 262,144 native context and extensibility to 1,010,000 tokens; MMLU-Pro 86.7, GPQA Diamond 86.6, SWE-bench Verified 72.0, Terminal Bench 2 49.4, BFCL-V4 72.2, TAU2-Bench 79.5, OmniDocBench1.5 89.8, OCRBench 92.1, MMMU-Pro 76.9, and VideoMME 87.3.
Qwen3.5 9B	Cost-sensitive multimodal apps, document understanding at scale, fast assistants, batch enrichment, and teams that want strong quality without jumping straight to a very large model.	9B parameters, 262,144 native context and extensibility to 1,010,000 tokens; MMLU-Pro 82.5, GPQA Diamond 81.7, AA-LCR 63.0, TAU2-Bench 79.1, OmniDocBench1.5 87.7, OCRBench 89.2, MMMU-Pro 70.1, and VideoMME 84.5.
Mistral Small 4 119B	Coding copilots, general assistants, agentic workflows, document extraction, and reasoning-heavy production services where speed and output efficiency matter.	119B total parameters with 6.5B active per token, 256k context, multimodal input, function calling, and configurable reasoning effort; Mistral reports 40% lower end-to-end completion time and 3x higher requests per second than Mistral Small 3, while NVIDIA’s model card reports AA LCR 0.72 with 1.6K characters and says the model matches or beats GPT-OSS 120B on AA LCR, LiveCodeBench, and AIME 2025 while producing shorter outputs.

Qwen3.5 122B (HF link) is the model we should reach for when failure is expensive and the task mixes reasoning, long context, and multimodal evidence. Its published profile is unusually broad: it is strong in knowledge and STEM, strong in agent benchmarks, and also strong in document and OCR-heavy vision tasks. In other words, it is a good fit for internal knowledge assistants, compliance review, multimodal research workflows, and advanced document copilots where one model needs to do many jobs well.

Qwen3.5 9B (HF link) is the most interesting model in the release from a deployment economics perspective. It keeps the same native 262k context window as the 122B model and still posts strong language, document, and visual results for its size, which makes it a credible default for high-volume applications where latency and cost matter more than squeezing out the very last bit of reasoning quality. In practical terms, this is the model we would try first for support triage, enterprise search assistants, document extraction, and multimodal back-office automation

Mistral Small 4 122b (HF link) is the model we should test when the workload looks like software delivery, tool use, or agent orchestration. Its published positioning is explicit: one unified model for instruct, reasoning, and developer-style tasks, with native function calling, multimodal input, and a much stronger serving-efficiency narrative than many models in the same class. That combination makes it attractive for coding assistants, PR-review bots, agent backends, and general-purpose enterprise assistants that need to stay responsive under load.

How do we call these models on Regolo?

The integration pattern is the same across all three models: discover the exact model name from Regolo, then call the chat completions endpoint with a Bearer token. Regolo’s documentation shows both the requests pattern and the /v1/chat/completions endpoint structure, so we can keep the client thin and portable.

To keep the script runnable across different workspaces, the example below resolves the exact Regolo model name from /models first, instead of assuming a hardcoded slug. That matters because model naming can vary slightly between catalogs, while the calling pattern stays stable.

pip install requestsCode language: Bash (bash)

import requests


api_url = "https://api.regolo.ai/v1/chat/completions"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_REGOLO_KEY"
}
data = {
  "model": "qwen3.5-122b",
  "messages": [
    {
      "role": "user",
      "content": "If a train travels 60 km/h for 2 hours and then 80 km/h for 1.5 hours, what is the total distance covered?"
    }
  ],
  "reasoning_effort": "low"
}

response = requests.post(api_url, headers=headers, json=data)
print(response.json())
Code language: Python (python)

🚀 Ready? Start your free trial on today

Discord – Share your thoughts
GitHub Repo – Code of blog articles ready to start
Follow Us on X @regolo_ai
Open discussion on our Subreddit Community

Built with ❤️ by the Regolo team. Questions? regolo.ai/contact or chat with us on Discord