GPT OSS 120B

gpt-oss-120b is OpenAI’s flagship open-weight Mixture-of-Experts language model with about 117B parameters and 5.1B active per token, optimized for high‑reasoning, agentic production workloads on a single 80GB GPU and released under the Apache 2.0 license. It offers a long ~131k‑token context window, configurable reasoning effort, and full chain‑of‑thought access for advanced debugging and control.

Core Model

Chat

gpt-oss-120b is OpenAI’s flagship open‑weight Mixture‑of‑Experts language model for production‑grade, high‑reasoning, general‑purpose use cases that still fits on a single 80 GB GPU (e.g., H100, MI300X). It targets advanced agentic workflows—tool use, browsing, code execution and structured outputs—while remaining fully fine‑tunable and Apache‑2.0 licensed for commercial deployment.

Get Started

pip install requestsCode language: Bash (bash)

import requests


api_url = "https://api.regolo.ai/v1/chat/completions"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_REGOLO_KEY"
}
data = {
  "model": "gpt-oss-120b",
  "messages": [
    {
      "role": "user",
      "content": "If a train travels 60 km/h for 2 hours and then 80 km/h for 1.5 hours, what is the total distance covered?"
    }
  ],
  "reasoning_effort": "medium"
}

response = requests.post(api_url, headers=headers, json=data)
print(response.json())
Code language: Python (python)

Output

{
  "id": "chatcmpl-a5eb9b058a3b0b01",
  "created": 1769558560,
  "model": "hosted_vllm/openai/gpt-oss-120b",
  "object": "chat.completion",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "**Step‑by‑step calculation**\n\n| Segment | Speed (km/h) | Time (h) | Distance = Speed × Time (km) |\n|---------|--------------|----------|------------------------------|\n| 1       | 60           | 2        | \\(60 \\times 2 = 120\\) |\n| 2       | 80           | 1.5      | \\(80 \\times 1.5 = 120\\) |\n\n**Total distance**\n\n\\[\n120\\text{ km} + 120\\text{ km} = 240\\text{ km}\n\\]\n\n---\n\n**Answer:** The train covers **240 kilometres** in total.",
        "role": "assistant",
        "reasoning_content": "We need to compute distance = speed * time for each segment, sum them.\n\nFirst segment: 60 km/h * 2 h = 120 km.\nSecond: 80 km/h * 1.5 h = 80 * 1.5 = 120 km.\nTotal = 240 km.\n\nProvide answer."
      },
      "provider_specific_fields": {
        "stop_reason": null,
        "token_ids": null
      }
    }
  ],
  "usage": {
    "completion_tokens": 218,
    "prompt_tokens": 97,
    "total_tokens": 315
  }
}Code language: JavaScript (javascript)

Application and Use Cases

Complex question answering and analysis where you need multi‑step reasoning, long context, and explicit rationales (for example, research copilots or strategy assistants).
Autonomous or semi‑autonomous agents that call tools, browse the web, and write or execute code as part of a workflow (for example, data pipelines, QA bots, or internal IT agents).
Large‑scale RAG systems that combine many documents, knowledge bases, or transcripts in a single long‑context query, taking advantage of the ~131k‑token window.
Advanced coding and debugging assistants that rely on chain‑of‑thought, tool calls, and execution traces for high‑stakes software engineering tasks.
Enterprise applications where on‑prem or VPC deployment, Apache‑2.0 licensing, and predictable GPU utilization on a single 80GB card are important requirements.