Secure and Private: Transcribing Meeting Recordings with Python and Regolo.ai

In today’s fast-paced world, effective communication is essential, especially in professional settings. Transcribing meeting recordings not only ensures that important discussions are documented and easily accessible but also helps maintain confidentiality and protect sensitive information. In this article, we will explore how to securely transcribe an audio recording of a meeting using Python. We will utilize the Regolo API for transcription and the Pydub library to handle audio files, all while prioritizing privacy with our secure solutions.

Overview of the Process

This script converts an audio file in OGG format, splits it into 5-minute segments, transcribes each segment, and removes any duplicate sentences that may arise during transcription. This ensures a clean and concise transcript that captures the essence of the meeting without unnecessary repetition.

Prerequisites

Before we dive into the code, make sure you have the following installed:

Pydub: This library is used for audio manipulation. You can install it using pip
Python: Ensure you have Python installed on your machine. You can download it from python.org.

pip install pydubCode language: Bash (bash)
  1. Regolo.ai API Key: Sign up for an API key from regolo.ai, this key will allow you to access our Zero Data Retention transcription services.

The Code

Here’s the complete code for transcribing a meeting recording:

# use this command for the needed libraries
# pip install openai pydub python-dotenv rich
# and install ffmpeg on the system (required by pydub):
# sudo apt-get update && sudo apt-get install -y ffmpeg

import openai
from pydub import AudioSegment
import os
import json
import re
import time, randoma
from concurrent.futures import ThreadPoolExecutor, as_completed
import tempfile
import argparse
import logging
from dotenv import load_dotenv
from rich.progress import Progress, BarColumn, TextColumn, MofNCompleteColumn, TimeElapsedColumn, TimeRemainingColumn, SpinnerColumn

INPUT_FILE = "/root/whisper/videoplayback.mp4"
CHUNK_DURATION_MS = 30 * 1000
CHUNKS_DIR = "/tmp/audio_chunks"
TRANSCRIPT_FILE = "/root/whisper/transcribe.txt"

load_dotenv()
openai.api_key = os.getenv("REGOLO_API_KEY")
if not openai.api_key:
    raise RuntimeError("Missing REGOLO_API_KEY env var")
openai.base_url = "https://api.regolo.ai/v1/"

def has_repeated_phrases(text, window_size=20, threshold=2):
    words = text.strip().split()
    if len(words) < window_size:
        return False
    tail = words[-window_size:]
    phrases = [' '.join(tail[i:i+5]) for i in range(len(tail) - 4)]
    count = {}
    for phrase in phrases:
        count[phrase] = count.get(phrase, 0) + 1
        if count[phrase] >= threshold:
            return True
    return False

def remove_repeated_sentences(text, chunk_index):
    sentences = [s.strip() for s in re.split(r'[.!?]+', text) if s.strip()]
    result = []
    for i, s in enumerate(sentences):
        if i == 0 or s != sentences[i-1]:
            result.append(s)
        else:
            print(f"Duplicate sentence in chunk {chunk_index+1}.")
    return '. '.join(result) + '.'

def convert_and_split_audio(input_path, chunk_ms, output_dir, overlap_ms=1500):
    os.makedirs(output_dir, exist_ok=True)
    audio = AudioSegment.from_file(input_path).set_channels(1).set_frame_rate(16000)
    chunks = []
    i = 0
    start = 0
    while start < len(audio):
        end = min(start + chunk_ms, len(audio))
        chunk = audio[start:end]
        chunk_path = os.path.join(output_dir, f"chunk_{i:05d}.wav")
        chunk.export(chunk_path, format="wav")
        chunks.append(chunk_path)
        i += 1
        if end == len(audio): break
        start = end - overlap_ms  # overlap
    return chunks

def transcribe_with_retry(file_path, chunk_index, total_chunks, language, max_attempts=5):
    for attempt in range(1, max_attempts+1):
        try:
            with open(file_path, "rb") as f:
                r = openai.audio.transcriptions.create(
                    model="faster-whisper-large-v3",
                    file=f,
                    language=language,
                    response_format="text"
                )
            transcript = normalize_response(r)
            if not has_repeated_phrases(transcript):
                return transcript
            logging.warning("Repetition detected. (Don't Worry, we're fixing it) Retrying block...")
        except Exception as e:
            logging.warning(f"API error attempt {attempt}: {e}")
        time.sleep(min(2**attempt + random.random(), 20))
    return remove_repeated_sentences(transcript if 'transcript' in locals() else "", chunk_index)

def transcribe_all(chunk_files, language, max_workers=3):
    results = [None]*len(chunk_files)
    total_chunks = len(chunk_files)
    bar = BarColumn(bar_width=None, style="grey50", complete_style="green", finished_style="green")
    with Progress(SpinnerColumn(), TextColumn("[bold]Transcribing[/]"), bar, MofNCompleteColumn(), TextColumn("•"), TimeElapsedColumn(), TextColumn("ETA"), TimeRemainingColumn()) as progress:
        task_id = progress.add_task("Transcribing", total=total_chunks)
        with ThreadPoolExecutor(max_workers=max_workers) as ex:
            futs = {ex.submit(transcribe_with_retry, fp, idx, total_chunks, language): idx for idx, fp in enumerate(chunk_files)}
            for fut in as_completed(futs):
                idx = futs[fut]
                try:
                    results[idx] = fut.result()
                    progress.advance(task_id, 1)
                except Exception as e:
                    logging.error(f"Chunk {idx+1} failed: {e}")
                    results[idx] = ""
                    progress.advance(task_id, 1)
    return "\n".join(results)

def normalize_response(r):
    if isinstance(r, dict) and "text" in r: return r["text"].strip()
    try:
        import json
        parsed = json.loads(r) if isinstance(r, str) else None
        if parsed and "text" in parsed: return parsed["text"].strip()
    except Exception:
        pass
    return str(r).strip()

if __name__ == "__main__":
    ap = argparse.ArgumentParser()
    ap.add_argument("--input", default=INPUT_FILE)
    ap.add_argument("--out", default=TRANSCRIPT_FILE)
    ap.add_argument("--chunk-sec", type=int, default=int(CHUNK_DURATION_MS/1000))
    ap.add_argument("--overlap-ms", type=int, default=1500)
    ap.add_argument("--workers", type=int, default=3)
    ap.add_argument("--language", default="en")
    args = ap.parse_args()

    logging.basicConfig(level=logging.INFO)
    for noisy in ("openai", "httpx", "httpcore"):
        logging.getLogger(noisy).setLevel(logging.WARNING)
    logging.info(f"Converting and splitting audio into {args.chunk_sec:.0f}s chunks...")

    with tempfile.TemporaryDirectory(prefix="audio_chunks_") as tmpdir:
        chunks = convert_and_split_audio(args.input, args.chunk_sec*1000, tmpdir, args.overlap_ms)
        logging.info(f"The video has been chunked in {len(chunks)} blocks")
        text = transcribe_all(chunks, language=args.language, max_workers=args.workers)
        with open(args.out, "w", encoding="utf-8") as f:
            f.write(text.strip())
    logging.info(f"Transcription complete. Saved to: {args.out}")Code language: Python (python)

Explanation of the Code

  1. Importing Libraries: The script begins by importing necessary libraries: openai for accessing the transcription API from regolo (that is fully OpenAI compatible), pydub for audio processing, and os and json for file handling and data manipulation.
  2. Configuration: The script defines several constants, including the input file path, chunk duration, output directory for audio chunks, and the output file for the transcription. Make sure to replace "/path/of/your/file" with the actual path to your audio file and set your regolo.ai API key.
  3. Functions:
    • has_repeated_phrases: This function checks for repeated phrases in the transcribed text to avoid redundancy.
    • remove_repeated_sentences: This function removes duplicate sentences from the transcription, ensuring clarity and conciseness.
    • convert_and_split_audio: This function converts the audio file into 5-minute chunks, which makes it easier to handle large recordings.
    • transcribe_audio_with_check: This function attempts to transcribe each audio chunk, checking for repeated phrases and retrying if necessary.
  4. Main Execution: The script converts the audio file into chunks, transcribes each chunk, and saves the final transcription to a text file.

Transcribing meeting recordings can significantly enhance productivity and ensure that important information is not lost. By using Python and some API calls, you can automate this process, making it easier to manage and access your meeting notes. With the provided script, you can efficiently convert audio recordings into clear, concise transcripts, ready for review and action.

This script is powered by Regolo.ai, which offers advanced AI inference capabilities for audio transcription and processing. If you’re looking to streamline your transcription workflow and improve the accuracy of your meeting notes, consider exploring Regolo’s services.