faster-whisper-large-v3

faster‑whisper‑large‑v3 is a CTranslate2‑optimized conversion of OpenAI’s Whisper large‑v3 that delivers high‑accuracy multilingual speech‑to‑text with significantly lower latency and VRAM usage for real‑time and batch transcription.

Core Model

STT

How to Get Started

pip install requestsCode language: Bash (bash)

import requests
from pathlib import Path


api_url = "https://api.regolo.ai/v1/audio/transcriptions"
headers = {
    "Content-Type": "application/json",
    "Authorization": "Bearer YOUR_REGOLO_KEY"
}
data = {
  "model": "faster-whisper-large-v3"
}

AUDIO_FILE = "/path/to/your/audio.ogg"

      audio_path = Path(AUDIO_FILE)
      if not audio_path.is_file():
          print(f"Audio file does not exist: {audio_path}")
          raise SystemExit(1)

      # For multipart requests, do NOT set "Content-Type"
      headers.pop("Content-Type", None)

      with audio_path.open("rb") as audio_file:
          files = {
              "file": (audio_path.name, audio_file, "application/octet-stream")
          }

          response = requests.post(api_url, headers=headers, data=data, files=files)

      if response.status_code == 200:
          transcript_text = response.text
          print("=== Transcription ===")
          print(transcript_text)
          print("=====================")
      else:
          print("Failed transcription request:")
          print("Status code:", response.status_code)
          print("Response body:", response.text)Code language: Python (python)

Applications & Use Cases

Marketing and product imagery with sharp typography, brand-safe layouts, and accurate English/Chinese text rendered directly in the scene (posters, banners, packaging).
Portraits and character design where enhanced human realism, facial detail, and age‑appropriate features are critical for both realistic and stylized outputs.
Natural and cinematic scenes—landscapes, water, foliage, fur, lighting—where the 2512 update improves micro‑textures and complex lighting gradients.
UI, infographic, and data‑viz generation that combines icons, charts, and dense text while keeping layout legible and coherent for social posts, dashboards, and presentations.
Fast, configurable text‑to‑image APIs using Diffusers or hosted providers (fal.ai, Replicate, Azure Foundry) that need a strong open‑source alternative to closed image models.