Skip to content
Regolo Logo

Secure & Scalable Meeting Transcription with Python and Regolo (Zero Data Retention)

👉Try Audio Transcription on Regolo for free

Transcribing sensitive board meetings or legal depositions poses a dilemma. You either waste hours with slow, local ffmpeg scripts to keep data private, or you risk confidentiality by sending audio to public US-based APIs that might retain files for “service improvement.”
Security teams often block standard transcription APIs because they cannot guarantee that the audio (containing trade secrets or PII) won’t be logged. Meanwhile, processing a 2-hour video sequentially can take 30+ minutes, delaying critical workflows.

Transcribe 1 hour of audio in under 5 minutes. 100% GDPR-compliant, Zero Data Retention, and parallel-ready execution using Python.

Outcome

  • Speed: Parallel chunking reduces latency by 4x–10x compared to serial requests (1h audio processed in minutes).
  • Privacy: Regolo’sZero Data Retention” policy ensures audio and transcripts are discarded immediately after processing—never stored or trained on.
  • Quality: Uses faster-whisper-large-v3, delivering State-of-the-Art accuracy even on technical jargon and multi-speaker audio.

Prerequisites (Fast)

  • Regolo API Key: Get it from the dashboard (set as REGOLO_API_KEY).
  • Libraries: pip install openai pydub rich python-dotenv (plus ffmpeg installed on system).​
  • Audio: MP3, WAV, or MP4 file.

Step-by-Step (Code Blocks)

1) Configure the OpenAI-Compatible Client

Connect to Regolo using the standard OpenAI SDK. This lets you switch backends instantly without rewriting logic.

import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()
client = OpenAI(
    api_key=os.getenv("REGOLO_API_KEY"),
    base_url="https://api.regolo.ai/v1"
)

# Test connection
print("Client configured for Regolo Zero-Retention API")Code language: Python (python)

Expected output: A configured client object ready to send requests to the European endpoint.

2) Intelligent Chunking (The Secret Sauce)

Instead of sending one massive file (which risks timeouts), split the audio into 5-minute overlapping chunks. This enables parallel processing.

from pydub import AudioSegment

def chunk_audio(file_path, chunk_len_ms=300000, overlap_ms=2000):
    audio = AudioSegment.from_file(file_path)
    chunks = []
    for i, start in enumerate(range(0, len(audio), chunk_len_ms - overlap_ms)):
        end = min(start + chunk_len_ms, len(audio))
        chunk = audio[start:end]
        chunk_name = f"/tmp/chunk_{i}.mp3"
        chunk.export(chunk_name, format="mp3")
        chunks.append(chunk_name)
    return chunksCode language: Python (python)

Expected output: A list of temporary file paths (/tmp/chunk_0.mp3, etc.) ready for the worker pool.

3) Parallel Transcription

Use a ThreadPoolExecutor to send multiple chunks to Regolo simultaneously. This drastically cuts total wait time.

from concurrent.futures import ThreadPoolExecutor

def transcribe_chunk(file_path):
    with open(file_path, "rb") as audio_file:
        transcript = client.audio.transcriptions.create(
            model="faster-whisper-large-v3",
            file=audio_file,
            response_format="text"
        )
    return transcript

# chunks = chunk_audio("meeting.mp4")
# with ThreadPoolExecutor(max_workers=5) as executor:
#     results = list(executor.map(transcribe_chunk, chunks))Code language: Python (python)

Expected output: An array of text strings returned out-of-order, which we will reassemble.

4) Stitch and Clean

Merge the text segments. (Optional: Use a simple overlap check to remove duplicate sentences at boundaries).

final_transcript = "\n".join(results)
with open("meeting_transcript.txt", "w") as f:
    f.write(final_transcript)
print(f"Saved {len(final_transcript)} characters.")Code language: Python (python)

Expected output: A clean, complete text file of the entire meeting.

Production-Ready Code

Here is the robust, “ship-it” version including retry logic, progress bars, and error handling.

import os
import logging
from concurrent.futures import ThreadPoolExecutor, as_completed
from pydub import AudioSegment
from openai import OpenAI
from rich.progress import Progress

# Config
API_KEY = os.getenv("REGOLO_API_KEY")
BASE_URL = "https://api.regolo.ai/v1"
MODEL = "faster-whisper-large-v3"

client = OpenAI(api_key=API_KEY, base_url=BASE_URL)

def transcribe_safe(file_path, attempt=1):
    try:
        with open(file_path, "rb") as f:
            return client.audio.transcriptions.create(
                model=MODEL, file=f, response_format="text"
            )
    except Exception as e:
        if attempt < 3:
            return transcribe_safe(file_path, attempt + 1)
        logging.error(f"Failed {file_path}: {e}")
        return ""

def process_meeting(input_file):
    # 1. Split
    audio = AudioSegment.from_file(input_file)
    chunk_len = 5 * 60 * 1000 # 5 mins
    chunks = []
    
    with Progress() as progress:
        task1 = progress.add_task("[cyan]Splitting...", total=len(audio))
        for i, start in enumerate(range(0, len(audio), chunk_len)):
            chunk = audio[start:start+chunk_len]
            path = f"temp_{i}.mp3"
            chunk.export(path, format="mp3")
            chunks.append((i, path))
            progress.update(task1, advance=chunk_len)

        # 2. Transcribe in Parallel
        results = [""] * len(chunks)
        task2 = progress.add_task("[green]Transcribing...", total=len(chunks))
        
        with ThreadPoolExecutor(max_workers=5) as ex:
            future_to_idx = {ex.submit(transcribe_safe, p): i for i, p in chunks}
            for future in as_completed(future_to_idx):
                idx = future_to_idx[future]
                results[idx] = future.result()
                progress.update(task2, advance=1)
                os.remove(chunks[idx][1]) # Cleanup

    return "\n".join(results)

if __name__ == "__main__":
    text = process_meeting("board_meeting.m4a")
    with open("transcript.txt", "w") as f:
        f.write(text)Code language: Python (python)

Benchmarks & Costs

Comparing Regolo against standard US alternatives for a 1-hour meeting file.

FeatureRegolo (faster-whisper-large-v3)Typical US Provider (Whisper)
Data PrivacyZero Data Retention (Logged nowhere)Retention defaults vary (often 30 days)
Latency (1h file)~3-5 mins (Parallel chunks)~15-20 mins (Queue/Serial)
Cost ModelPay-per-usage (Competitive)~$0.36/hour ($0.006/min) 
RegionEurope (Italy)Usually US East

👉Try it on Regolo for free


Resources & Community

Official Documentation:

Related Guides:

Join the Community:


🚀 Ready to Deploy?

Get Free Regolo Credits →


Built with ❤️ by the Regolo team. Questions? support@regolo.ai