Skip to content
Regolo Logo

gemma-4-31B

gemma‑4‑31B is a 30.7B‑parameter dense multimodal model from Google DeepMind with 256K context, native thinking mode, function calling, and text/image/video support across 140+ languages under Apache 2.0.
Core Model
Chat

How to Get Started

pip install requestsCode language: Bash (bash)
import requests


api_url = "https://api.regolo.ai/v1/chat/completions"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_REGOLO_KEY"
}
data = {
  "model": "gemma4-31b",
  "messages": [
    {
      "role": "user",
      "content": "What is the capital of Italy, and which region does it belong to?"
    }
  ],
  "reasoning_effort": "low"
}

response = requests.post(api_url, headers=headers, json=data)
print(response.json())Code language: Python (python)

Output

{
  "id": "chatcmpl-a4988541-84b1-41a5-843f-06790a11f7fc",
  "created": 1769560420,
  "model": "hosted_vllm/gemma4-31b",
  "object": "chat.completion",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "The capital of Italy is Rome (Italian: Roma). Rome belongs to the Lazio region.",
        "role": "assistant"
      }
    }
  ],
  "usage": {
    "completion_tokens": 62,
    "prompt_tokens": 45,
    "total_tokens": 107
  }
}Code language: JSON / JSON with Comments (json)

Applications & Use Cases

  • Multimodal chat assistants for customer support, knowledge bases, and internal copilots that combine text, image, and video understanding in 140+ languages.
  • Reasoning and coding copilots that use thinking mode for step‑by‑step problem solving, mathematical proofs, and complex code generation or debugging.
  • Document intelligence pipelines for PDFs, forms, and scanned contracts, leveraging native OCR and handwriting recognition with 256K context for large documents.
  • Tool‑ and function‑calling agents that orchestrate APIs, databases, and multi‑step workflows inside enterprise automation or data retrieval backends.
  • Video understanding workflows for surveillance, education, or sports analytics, using up to 60‑second video inputs processed as frame sequences.
  • On‑device and workstation deployments where the 30.7B dense architecture fits a single high‑end GPU (≈17.4 GB at 4‑bit quantization) without MoE infrastructure overhead.