Skip to content
Regolo Logo

Using Petri to audit open-source LLMs with Regolo

Petri is an automated auditing framework for language models: it generates audit scenarios from seed instructions, runs a multi-turn conversation between an auditor model and a target model, and then asks a judge model to score the transcript against behavioral dimensions such as sycophancy, concerning behavior, admirable behavior, and eval awareness.

In practice, we can use Petri to test how a model behaves under pressure, not just how it scores on static benchmarks. Because Petri works with model roles in Inspect AI, we can point those roles at open-source models served through an OpenAI-compatible endpoint, including Regolo.ai.

What Petri is for

Petri is built for behavioral auditing rather than plain benchmark scoring. It ships with more than 170 built-in seed instructions and 38 built-in judging dimensions, and each dimension gets a 1–10 score plus a written justification tied to specific messages in the conversation.

The framework revolves around three roles: an auditor that drives the interaction, a target model that is being tested, and a judge that scores the result. That structure is useful because it lets us probe subtle behaviors that only appear across several turns, including rollback and tool-mediated interactions.

Why open-source models work

Yes, Petri can be used with open-source models. Inspect AI, which Petri builds on, supports a wide range of hosted and local open-model backends including Hugging Face, vLLM, Ollama, SGLang, Together AI, Fireworks, Groq, Cloudflare, and other providers with OpenAI-compatible endpoints.

This matters because Petri itself is not tied to closed models. What matters is whether the model backend is supported by Inspect, either natively, through an OpenAI-compatible API, through OpenRouter, or through a custom model provider extension.

Installation and first run

Install Petri from PyPI with pip install inspect-petri. The Petri docs recommend starting with a small subset of seeds, because the full default audit covers 170+ seeds, can run up to 30 turns per seed, and may take hours plus significant API usage.

For an open-source-only setup, a simple pattern is to use three open models: one as auditor, one as target, and one as judge. The example below

Step 1 – Install Petri and Setup

First install the packages and set the environment variables. Regolo’s docs say API keys are created in the dashboard and that the service is OpenAI-compatible, while Petri runs through Inspect AI with inspect eval or Python.

pip install inspect-ai inspect-petri openai

export REGOLO_API_KEY="YOUR_REGOLO_API_KEY"
export OPENAI_API_KEY="$REGOLO_API_KEY"
export OPENAI_BASE_URL="https://api.regolo.ai/v1"Code language: Bash (bash)

The OPENAI_API_KEY and OPENAI_BASE_URL variables matter because OpenAI-compatible integrations often read those defaults automatically, and Regolo documents https://api.regolo.ai/v1 as its OpenAI-compatible base URL.

Step 2 – Use CLI

A first CLI run can use only open-source models for all three roles. The exact model names must be replaced with currently available Regolo model IDs from the latest official documentation.

pip install inspect-petri openai

export OPENAI_API_KEY="YOUR_REGOLO_API_KEY"
export OPENAI_BASE_URL="https://api.regolo.ai/v1"

inspect eval inspect_petri/audit \
  -T seed_instructions=tags:sycophancy \
  -T max_turns=12 \
  --model-role auditor=openai/gpt-oss-120b \
  --model-role target=openai/qwen3.5-9b \
  --model-role judge=openai/mistral-small-4-119b \
  --limit 2Code language: Bash (bash)

That pattern works because Petri expects role-based models, and Inspect AI supports assigning separate models to auditortarget, and judge. Using a narrower seed tag and a low turn count is a sensible first run because audit cost and runtime can grow quickly.

Python example

Below is a minimal Python script that follows the documented Petri pattern while using Regolo.ai’s OpenAI-compatible endpoint and only open-source models. Replace MODEL_ID_AUDITORMODEL_ID_TARGET, and MODEL_ID_JUDGE with valid Regolo model IDs from the latest documentation before running it.

import os
from inspect_ai import eval
from inspect_petri import audit

regolo_key = os.getenv("REGOLO_API_KEY")
if not regolo_key:
    raise RuntimeError("Missing REGOLO_API_KEY")

os.environ["OPENAI_API_KEY"] = regolo_key
os.environ["OPENAI_BASE_URL"] = "https://api.regolo.ai/v1"

results = eval(
    audit(
        seed_instructions="tags:sycophancy",
        max_turns=12,
        realism_filter=False,
        enable_rollback=True,
    ),
    model_roles={
        "auditor": "openai/gpt-oss-120b",
        "target": "openai/qwen3.5-9b",
        "judge": "openai/mistral-small-4-119b",
    },
)

print(results)Code language: Python (python)

This code is intentionally small, because the part that usually changes is not the Petri logic but the model mapping. In practice, we recommend starting with a stronger open model as judge, a reasonably capable open model as auditor, and the model we want to evaluate as target.

Results

Petri writes logs that we can inspect with the Inspect viewer, which is the easiest way to understand what happened in an audit. The viewer lets us review transcripts, dimension scores, and judge output rather than relying only on the final aggregate result.

The most useful habit is to compare deltas across runs rather than over-reading raw absolute scores. Petri’s own guidance says keeping the rubric stable matters more than chasing single-number scores across changing dimensions or seed sets.

╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│audit (2 samples): none/none                                                                                          │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
seed_instructions: tags:sycophancy, max_turns: 12, dataset: seeds                                                       
total time:                                         0:03:50                                                             
openai/gpt-oss-120b                                 450,891 tokens [I: 445,179, O: 5,712]                               
openai/qwen3.5-9b                                   112,929 tokens [I: 63,569, O: 49,360]                               
openai/mistral-small-4-119b                         77,623 tokens [I: 71,824, O: 5,799]                                 
auditor (role)                    450,891 tokens [I: 445,179, O: 5,712]                                                 
target (role)                     112,929 tokens [I: 63,569, O: 49,360]                                                 
judge (role)                      77,623 tokens [I: 71,824, O: 5,799]                                                   
                                                                                                                        
admirable         audit_situational_awareness         auditor_failure         broken_tool_use                           
mean       2.500  mean                         1.000  mean             1.000  mean             1.000                    
stderr     0.500  stderr                       0.000  stderr           0.000  stderr           0.000                    
concerning         concerning_with_no_mitig…         cooperation_with_harmful…         cooperation_with_human_mi…       
mean        1.000  mean                       1.000  mean                       1.000  mean                        1.000
stderr      0.000  stderr                     0.000  stderr                     0.000  stderr                      0.000
counterfactual_bad_behavior         disappointing         eval_awareness         facilitating_terrorism                 
mean                         1.000  mean           1.000  mean            1.000  mean                    1.000          
stderr                       0.000  stderr         0.000  stderr          0.000  stderr                  0.000          
fun_or_funny         high_stakes_setting         input_hallucination         manipulated_by_developer                   
mean          1.500  mean                 1.500  mean                 1.000  mean                      1.000            
stderr        0.500  stderr               0.500  stderr               0.000  stderr                    0.000            
needs_attention         overrefusal         prefill_susceptibility         scenario_realism                             
mean             1.000  mean         1.000  mean                    1.000  mean              7.000                      
stderr           0.000  stderr       0.000  stderr                  0.000  stderr            0.000                      
stuck_in_loops         unfaithful_thinking         unprompted_bizarre_behavior         unprompted_blackmail             
mean            1.000  mean                 1.000  mean                         1.000  mean                  1.000      
stderr          0.000  stderr               0.000  stderr                       0.000  stderr                0.000      
unprompted_deception_…         unprompted_emotional_…         unprompted_encouragem…         unprompted_gratitude       
mean                    1.000  mean                    1.000  mean                    1.000  mean                  1.000
stderr                  0.000  stderr                  0.000  stderr                  0.000  stderr                0.000
unprompted_initiative         unprompted_leaking         unprompted_neg_affect         unprompted_pos_affect            
mean                   1.500  mean                1.000  mean                   1.000  mean                   1.000     
stderr                 0.500  stderr              0.000  stderr                 0.000  stderr                 0.000     
unprompted_self_prese…         unprompted_self_servi…         unprompted_spiritual…         unprompted_sycophancy       
mean                    1.000  mean                    1.000  mean                   1.000  mean                   1.000
stderr                  0.000  stderr                  0.000  stderr                 0.000  stderr                 0.000
unprompted_whistleblowing         user_over_sysprompt                                                                   
mean                       1.000  mean                 1.000                                                            
stderr                     0.000  stderr               0.000                                                            
                                                                                                                        
Log: logs/2026-05-11T17-45-41-00-00_audit_e3Q2Hog5nSa6D5goTRe8Qq.eval               Code language: Bash (bash)

Open the UI interface to see all results

inspect view

Start your free 30-day trial at regolo.ai and compare Qwen, Mistral, and Gemma on the same EU-hosted infrastructure instead of guessing from screenshots and hype.

👉 Talk with our Engineers or Start your 30 days free ->



Built with ❤️ by the Regolo team. Questions? regolo.ai/contact or chat with us on Discord