Building a Document Analysis System with Regolo AI API

Co-authored by Matteo Mendula – AI VENTURE BUILDER LDT – http://aivb.ai/

Processing tenders and contracts manually is a bottleneck for every growing organization. While Large Language Models offer a solution, the operational overhead of hosting a 70B parameter model—GPU infrastructure, scaling, maintenance—is a barrier for many teams.

This is where API-based LLM services become valuable. Rather than managing infrastructure, developers can focus on building their applications while relying on a hosted service for the AI capabilities.

We offer exactly this: an API-based LLM service hosted in Italy, providing access to powerful open-source models like Llama and Mistral without the operational overhead.

In this article, we describe our experience building LLM-Document, a document analysis system that leverages the Regolo API.

What you’ll discover in short: the application validates uploaded documents, extracts structured information across multiple categories, and generates executive summaries-all powered by Regolo’s hosted Llama-3.3-70B-Instruct model.

Getting Started with Regolo API

One of the most pleasant surprises when working with Regolo was the simplicity of the initial setup. The API follows the OpenAI chat/completions standard, which means developers familiar with the OpenAI API can get started immediately with zero learning curve.

Minimal Configuration

Getting started requires just three environment variables:

REGOLO_API_KEY=your-api-key

REGOLO_API_BASE_URL=https://api.regolo.ai/v1

REGOLO_MODEL=Llama-3.3-70B-InstructCode language: Python (python)

The REGOLO_API_BASE_URL defaults to https://api.regolo.ai/v1, so technically only the API key and model selection are required. We chose Llama-3.3-70B-Instruct as our default model for its strong instruction-following capabilities and reliable structured output generation.

OpenAI-Compatible Interface

The API endpoint structure mirrors OpenAI’s format exactly:

POST https://api.regolo.ai/v1/chat/completionsCode language: JavaScript (javascript)

This compatibility is a significant advantage. If you have existing code that works with OpenAI’s API, migrating to Regolo is often as simple as changing the base URL and API key. The request and response formats are identical:

This extends to the official OpenAI SDK as well. You can use import OpenAI from ‘openai’ and simply point it to Regolo’s endpoint:

// Using the official OpenAI SDK - just change baseURL
import OpenAI from 'openai';
const client = new OpenAI({
  apiKey: process.env.REGOLO_API_KEY,
  baseURL: 'https://api.regolo.ai/v1',
});

const response = await client.chat.completions.create({
  model: 'Llama-3.3-70B-Instruct',
  messages: [
    { role: 'system', content: 'You are a document analyst.' },
    { role: 'user', content: 'Analyze this document...' }
  ],
  temperature: 0.3,
});Code language: JavaScript (javascript)

This means existing projects using the OpenAI SDK can migrate with minimal code changes—just update the configuration. For our implementation, we used the native fetch API:

// Request body structure

{

  model: "Llama-3.3-70B-Instruct",

  messages: [

    { role: "system", content: "..." },

    { role: "user", content: "..." }

  ],

  temperature: 0.3

}Code language: JavaScript (javascript)

Model Selection Flexibility

Regolo offers multiple models, allowing you to choose based on your specific requirements-whether prioritizing speed, cost, or capability. In our case, we opted for Llama-3.3-70B-Instruct because it provides an excellent balance of accuracy and response quality for document analysis tasks. The model selection is simply an environment variable, making it trivial to experiment with different models without code changes.

We also set a low temperature (0.3) to encourage more deterministic outputs, which is important when parsing structured responses from the model.

Building a Document Analysis Pipeline

With the API integration in place, we designed a three-phase pipeline for document processing. Each phase serves a specific purpose, and the architecture allows for graceful handling of invalid documents while maximizing the value extracted from valid ones.

The Three Phases

┌─────────────────────────────────────────────────────────────┐
│ Phase 1: VALIDATION (Automatic Gate Step)                   │
│ • Checks if document matches expected type                  │
│ • Returns: valid/invalid + explanation                      │
└────────────────────┬────────────────────────────────────────┘
                     │ (Only proceeds if valid)
                     ↓
┌─────────────────────────────────────────────────────────────┐
│ Phase 2: ANALYSIS (Automatic)                               │
│ • Extracts structured information (10 sections)             │
│ • Returns: comprehensive markdown analysis                  │
└────────────────────┬────────────────────────────────────────┘
                     │
                     ↓
┌─────────────────────────────────────────────────────────────┐
│ Phase 3: SUMMARY (User-Triggered)                           │
│ • Generates executive summary from analysis                 │
│ • Returns: 6-section summary with Go/No-Go recommendation   │
└─────────────────────────────────────────────────────────────┘Code language: JavaScript (javascript)

Phase 1: Validation acts as a gate.

Before investing compute resources in detailed analysis, we verify that the uploaded document is actually what we expect (e.g., a tender document). This prevents wasted processing and provides immediate feedback to users who upload incorrect files.

Phase 2: Analysis extracts comprehensive information across ten predefined sections:

document overview, key requirements, deadlines, evaluation criteria, budget information, technical specifications, compliance requirements, submission instructions, contact information, and risk factors.

Phase 3: Summary condenses the detailed analysis into an executive-level overview.

This phase is triggered manually by the user, allowing them to review the full analysis first before requesting a summary.

Domain Filters for Specialization

We implemented a filter system that allows domain-specific analysis. For example, a “Pharmaceutical” filter adds specialized validation criteria (checking for FDA compliance, GMP standards) and additional analysis sections relevant to healthcare procurement. These filters modify the prompts sent to Regolo without requiring code changes-just JSON configuration updates.

Architectural Patterns

The reliability of the Regolo API enabled us to implement robust architectural patterns:

Retry with Exponential Backoff: Network issues happen. Our implementation automatically retries failed requests with increasing delays (1s → 2s → 4s), ensuring transient failures don’t break the user experience.

Configurable Prompt Templates: All prompts are loaded from external JSON configuration files. This separation allows prompt engineering and tuning without touching application code-a significant advantage for iterative improvement.

Code Deep-Dive: API Integration

Let’s examine the core integration code. The following TypeScript implementation shows how we interact with the Regolo API:

Basic API Call

async function callRegoloAPI(
  messages: Array<{ role: string; content: string }>,
  retryCount = 0
): Promise<RegoloResponse> {
  const response = await fetch(
    `${REGOLO_API_BASE_URL}/chat/completions`,
    {
      method: "POST",
      headers: {
        Authorization: `Bearer ${REGOLO_API_KEY}`,
        "Content-Type": "application/json",
      },
      body: JSON.stringify({
        model: REGOLO_MODEL,
        messages,
        temperature: REGOLO_TEMPERATURE,
      }),
    }
  );

  if (!response.ok) {
    const errorText = await response.text();
    throw new Error(`Regolo API error (${response.status}): ${errorText}`);
  }

  return await response.json();
}Code language: JavaScript (javascript)

The response structure includes useful metadata:

interface RegoloResponse {
  id: string;
  model: string;
  choices: Array<{
    message: { role: string; content: string };
    finish_reason: string;
  }>;
  usage?: {
    prompt_tokens: number;
    completion_tokens: number;
    total_tokens: number;
  };
}Code language: JavaScript (javascript)

The usage field is particularly valuable for monitoring costs and understanding token consumption patterns.

Robust JSON Parsing

LLMs occasionally format their responses inconsistently. We implemented a multi-strategy parser to handle various response formats:

function parseValidationJSON(content: string): { valid: boolean; message: string } {
  // Strategy 1: Direct JSON parse
  try {
    const parsed = JSON.parse(content);
    if (typeof parsed.valid === 'boolean' && typeof parsed.message === 'string') {
      return parsed;
    }
  } catch { /* continue */ }

  // Strategy 2: Extract from markdown code blocks
  try {
    const jsonMatch = content.match(/```(?:json)?\s*(\{[\s\S]*?\})\s*```/);
    if (jsonMatch) {
      const parsed = JSON.parse(jsonMatch[1]);
      if (typeof parsed.valid === 'boolean') return parsed;
    }
  } catch { /* continue */ }

  // Strategy 3: Find JSON object in mixed text
  try {
    const jsonObjectMatch = content.match(/\{[\s\S]*"valid"[\s\S]*\}/);
    if (jsonObjectMatch) {
      return JSON.parse(jsonObjectMatch[0]);
    }
  } catch { /* all strategies failed */ }

  throw new Error("Unable to parse validation response");
}Code language: JavaScript (javascript)

This approach ensures resilience against minor variations in model output formatting.

Retry Logic with Exponential Backoff

if (retryCount < MAX_RETRIES) {
  const delay = RETRY_DELAY * Math.pow(2, retryCount); // 1s, 2s, 4s
  await new Promise((resolve) => setTimeout(resolve, delay));
  return callRegoloAPI(messages, retryCount + 1);
}Code language: JavaScript (javascript)

This pattern prevents cascading failures during temporary network issues or API rate limiting.

Results and Experience

After building and deploying the system, we can share several observations about working with the Regolo API.

Response Quality

The Llama-3.3-70B-Instruct model delivered consistently high-quality responses for our document analysis use case. Complex PDF and DOCX documents were understood accurately, with the model successfully extracting relevant information and organizing it according to our requested format. The structured output followed the template requirements reliably, which made downstream processing straightforward.

The model demonstrated good comprehension of domain-specific terminology across different document types. When using specialized filters (pharmaceutical, military, IT), the responses appropriately emphasized sector-relevant considerations.

Service Reliability

Throughout our development and testing, the Regolo API performed reliably. We did not experience unexpected downtime, and response latencies were consistent and acceptable for interactive use cases. The API handles rate limiting gracefully, returning clear error messages when limits are approached.

The token usage tracking included in every response proved valuable for monitoring costs and optimizing prompts. We could iterate on prompt design while keeping track of token consumption.

Developer Experience

The OpenAI-compatible interface significantly reduced our integration time. Existing knowledge and code patterns transferred directly. The Italian hosting ensures complete independence from US-based services, a critical factor for data sovereignty and AI sovereignty. European organizations can operate without reliance on American infrastructure, addressing both regulatory compliance and strategic autonomy concerns.

Changing models or adjusting parameters requires only environment variable updates-no code deployment necessary. This flexibility supported rapid experimentation during development.

Comparison with OpenAI

Since Regolo’s API is fully compatible with OpenAI’s interface, we ran parallel evaluations using GPT-4o and GPT-4o-mini to understand the trade-offs. Our comparison focused on practical deployment concerns: cost structure, data handling, model behavior, and operational flexibility.

Cost Structure

Here’s the concrete pricing breakdown for document analysis workloads:

Model	Input (per 1M tokens)	Output (per 1M tokens)	Context Window	Best For
Regolo Llama-3.3-70B	€0.60	€2.70	128K	Primary analysis
Regolo Llama-3.1-8B	€0.05	€0.25	128K	Validation/classification
Regolo DeepSeek-R1-70B	€0.60	€2.70	128K	Complex reasoning
Regolo Qwen3-30B	€0.50	€1.80	128K	Cost-optimized analysis
OpenAI GPT-4o	~$2.50 (€2.30)	~$10.00 (€9.20)	128K	High-complexity reasoning
OpenAI GPT-4o-mini	~$2.50 (€2.30)	~$0.60 (€0.55)	128K	High-volume processing

Cost Analysis for Our Use Case:

For a typical 50-page tender document (15k input tokens, 2k output tokens):

Regolo Llama-3.3-70B: €0.009 (input) + €0.0054 (output) = €0.0144 per document
OpenAI GPT-4o: €0.0345 (input) + €0.0184 (output) = €0.0529 per document
Cost savings: ~73% using Regolo’s Llama-3.3-70B vs GPT-4o

Interestingly, Regolo’s smaller models create additional optimization opportunities. We used Llama-3.1-8B (€0.05/1M input) for the Phase 1 Validation step-since it’s a simple classification task-and reserve the 70B model for Phases 2-3. This hybrid approach brings our average cost to roughly €0.008 per document, or about 85% lower than using GPT-4o for the entire pipeline.

Model Ecosystem Beyond Text

Unlike basic LLM providers, Regolo offers a comprehensive AI stack that allowed us to extend LLM-Document without integrating multiple vendors:

Vision & OCR Capabilities

DeepSeek OCR: €0.02 per request for scanned document text extraction
Gemma-3-27b-it: €0.95/€5.50 per 1M tokens with vision capabilities (useful for analyzing document layouts, signatures, stamps)
Qwen3-VL-32b: €0.50/€2.50 per 1M tokens for visual question answering on document images

In contrast, OpenAI’s vision capabilities are only available via GPT-4o at premium pricing. Regolo’s separate OCR endpoint (€0.02 flat rate) proved significantly cheaper for simple text extraction than using an LLM to read images.

Embeddings & RAG

gte-Qwen2 & Qwen3-Embedding-8B: €0.001 per request (essentially free)
Qwen3-Reranker-4B: €0.01 per query

We implemented a RAG pipeline for document similarity checking using Regolo’s embeddings at negligible cost, compared to OpenAI’s text-embedding-3-small at $0.02 per 1M tokens.

Audio Processing

Faster-Whisper-Large-v3: €0.00015 per second for meeting transcription

This allowed us to add voice memo analysis features to LLM-Document without adding another vendor integration.

Model Selection Flexibility

Regolo offers 16+ models across different architectures, whereas OpenAI’s API is limited to their proprietary GPT series (GPT-4o, o1, o3, etc.). This diversity enabled sophisticated cost-quality tradeoffs:

Our Three-Tier Architecture:

Gate/Validation: Llama-3.1-8B (€0.05 input) – Cheap, fast binary classification
Standard Analysis: Llama-3.3-70B (€0.60 input) – High-quality extraction
Complex Reasoning: DeepSeek-R1-70B (€0.60 input) – Chain-of-thought for ambiguous contract clauses

With OpenAI, switching between GPT-4o-mini and GPT-4o is the only option. With Regolo, we could fine-tune cost vs. capability across eight different parameter scales (8B to 120B).

Quantization Options

Regolo discloses quantization levels (FP16, Q4_K_M, AWQ, MXFP4), which matters for deterministic outputs. We observed that FP16 Llama-3.3-70B produces more consistent JSON parsing than quantized alternatives, worth the slight premium over compressed models.

When to Choose Which

Choose Regolo when:

Processing high volumes (73% cost savings at scale)
You need EU data residency without enterprise negotiations
You want multi-modal capabilities (OCR, vision, audio) from one provider
You need embedding/RAG infrastructure at near-zero cost
You prefer open-weight models for auditability (you know exactly which Llama/Mistral/Qwen checkpoint is running)

Choose OpenAI when:

You need frontier reasoning capabilities (o1/o3 series for complex logical deduction)
You require guaranteed 99.9%+ uptime SLAs (OpenAI’s enterprise tier has stricter guarantees)
You’re already deeply integrated with OpenAI’s ecosystem (Assistants API, fine-tuning, etc.)
You need the absolute lowest latency (GPT-4o-mini is faster than Regolo’s smallest models)

Hybrid Architecture:

Our production system actually uses both strategically:

Regolo for EU data residency, OCR, embeddings, and 90% of document analysis (cost efficiency)
OpenAI GPT-4o only for edge cases requiring complex multi-hop reasoning across 200+ page documents where the marginal quality improvement justifies the 4x cost increase

The OpenAI-compatible interface makes this fallback mechanism trivial-literally a configuration change, not a code rewrite.

Our experience integrating Regolo AI into the LLM-Document project has been positive. The combination of a familiar API interface, reliable service, and the availability of powerful open-source models makes Regolo a practical choice for building LLM-powered applications.

For organizations considering similar projects, we recommend Regolo particularly in these scenarios:

Rapid prototyping: The OpenAI-compatible API allows quick integration, letting teams focus on application logic rather than infrastructure.
Data and AI sovereignty: Italian-hosted infrastructure means complete independence from US providers, addressing both regulatory compliance and strategic autonomy for European organizations.
Open-source model preference: Access to Llama, Mistral, and other open models without self-hosting complexity.

The patterns we implemented-retry logic, multi-strategy parsing, configurable prompts-are transferable to any LLM API integration. The key insight is that a reliable API enables robust application architecture, and Regolo delivered on that front.

The LLM-Document project is available for enterprise licensing. We hope this article provides useful guidance for others building similar document analysis systems.

Share this article