How to benchmark memory usage between Hermes Agent and OpenClaw

The cleanest way to compare Hermes Agent and OpenClaw is to keep both agents local, send the same workload to the same model backend, and measure RSS memory, disk usage, and recall latency under identical conditions.

Hermes ships with a built-in learning loop, persistent memory, and FTS5-based cross-session recall. OpenClaw, conversely, operates around a local-first gateway and workspace-based agent setup, heavily relying on the LLM’s context window.

This benchmark isolates how each memory approach behaves over time rather than mixing in model-hosting differences.

Why this benchmark matters

If we strip the jargon away, this test is about one practical question: after processing hundreds of events, which memory strategy stays smaller, cleaner, and faster to query?

To guarantee a fair test, we use Regolo for the backend. This ensures both agents call the same external model (like minimax-m2.5) instead of changing provider variables mid-test. Regolo’s free trial with unlimited tokens on the pay-as-you-go tier makes it the perfect fit for this heavy-duty workload.

Step 1: Prepare the local environment

First, install Hermes. It already exposes the main commands we need for setup, model selection, and configuration via a single binary.

curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
source ~/.bashrc   # or ~/.zshrc
hermes setupCode language: Bash (bash)

Next, install OpenClaw. OpenClaw recommends Node 24 or Node 22.16+.

npm install -g openclaw@latest
openclaw onboard --install-daemonCode language: Bash (bash)

Configure both agents to use Regolo as the underlying inference provider. Replace the API key placeholder with your actual key from the Regolo dashboard.

For OpenClaw, we inject the custom provider directly:

cat << 'EOF' > /tmp/regolo.json
{
  "baseUrl": "https://api.regolo.ai/v1",
  "apiKey": "YOUR_REGOLO_API_KEY",
  "api": "openai-completions",
  "models": [
    {
      "id": "minimax-m2.5",
      "name": "minimax-m2.5",
      "reasoning": true,
      "contextWindow": 196608,
      "maxTokens": 196608
    }
  ]
}
EOF

# Set the provider and default model
openclaw config set models.providers.regolo --batch-file /tmp/regolo.json --strict-json
openclaw models aliases add minimax-m2.5 regolo/minimax-m2.5
openclaw models set regolo/minimax-m2.5
Code language: Bash (bash)

For Hermes, run hermes setup or edit its config to point to Regolo’s OpenAI-compatible endpoint.

Ensure the background daemons for both agents are running:

openclaw gateway start
hermes gateway startCode language: Bash (bash)

Step 2: The Orchestrator Script

Instead of simulating memory structures, we wrote a real “puppeteer” script (orchestrator.py) that interacts directly with the live background daemons of OpenClaw and Hermes via their official CLI commands (openclaw agent and hermes chat).

Create orchestrator.py and paste the following Python code:

#!/usr/bin/env python3
import json
import os
import subprocess
import time
from pathlib import Path

import psutil
from rich.console import Console
from rich.table import Table

console = Console()

NUM_EVENTS = 300
OC_SESSION_ID = "benchmark_oc_v1"
HR_SESSION_ID = "benchmark_hr_v1"

OC_STATE_DIR = Path.home() / ".openclaw"
HR_STATE_DIR = Path.home() / ".hermes"

def make_events(n: int) -> list[dict]:
    events = []
    for i in range(1, n + 1):
        events.append({
            "id": i,
            "ticket": f"TICKET-{i:04d}",
            "text": f"TICKET-{i:04d}: A customer reported a billing anomaly. Fixed in {(i%9)+2} mins."
        })
    return events

def get_daemon_pid(process_name_hints: list[str]) -> int | None:
    for proc in psutil.process_iter(['pid', 'name', 'cmdline']):
        try:
            cmdline = " ".join(proc.info['cmdline'] or [])
            for hint in process_name_hints:
                if hint in cmdline and "orchestrator.py" not in cmdline:
                    return proc.info['pid']
        except (psutil.NoSuchProcess, psutil.AccessDenied, psutil.ZombieProcess):
            pass
    return None

def get_rss_mb(pid: int) -> float:
    if not pid: return 0.0
    try: return psutil.Process(pid).memory_info().rss / 1024 / 1024
    except psutil.NoSuchProcess: return 0.0

def get_dir_size_bytes(directory: Path) -> int:
    if not directory.exists(): return 0
    total = 0
    for root, _, files in os.walk(directory):
        for f in files:
            fp = os.path.join(root, f)
            if not os.path.islink(fp): total += os.path.getsize(fp)
    return total

def run_benchmark():
    events = make_events(NUM_EVENTS)
    oc_pid = get_daemon_pid(["openclaw", "gateway"])
    hr_pid = get_daemon_pid(["hermes", "gateway", "start"])
        
    oc_start_rss = get_rss_mb(oc_pid) if oc_pid else 0.0
    hr_start_rss = get_rss_mb(hr_pid) if hr_pid else 0.0
    oc_start_disk = get_dir_size_bytes(OC_STATE_DIR)
    hr_start_disk = get_dir_size_bytes(HR_STATE_DIR)

    console.print(f"\n[bold cyan]Starting real injection of {NUM_EVENTS} events...[/bold cyan]")
    
    t0 = time.perf_counter()
    for e in events:
        console.print(f"  [yellow]Injecting {e['ticket']} -> OpenClaw[/yellow]")
        subprocess.run(["openclaw", "agent", "--local", "--session-id", OC_SESSION_ID, "-m", e["text"], "--json"], check=False)
        
        console.print(f"  [yellow]Injecting {e['ticket']} -> Hermes[/yellow]")
        subprocess.run(["hermes", "chat", "-q", e["text"], "-Q", "--continue", HR_SESSION_ID], check=False)
    
    build_time = time.perf_counter() - t0
    console.print(f"  [green]Done injecting in {build_time:.2f}s[/green]\n")

    time.sleep(2)
    recall_prompt = "What was the exact outcome and fix pattern for TICKET-0002?"
    console.print("[bold cyan]Testing Recall Latency...[/bold cyan]")
    
    rt0 = time.perf_counter()
    subprocess.run(["openclaw", "agent", "--local", "--session-id", OC_SESSION_ID, "-m", recall_prompt, "--json"], check=False)
    oc_recall_ms = (time.perf_counter() - rt0) * 1000
    
    rt0 = time.perf_counter()
    subprocess.run(["hermes", "chat", "-q", recall_prompt, "-Q", "--continue", HR_SESSION_ID], check=False)
    hr_recall_ms = (time.perf_counter() - rt0) * 1000

    oc_end_rss = get_rss_mb(oc_pid) if oc_pid else 0.0
    hr_end_rss = get_rss_mb(hr_pid) if hr_pid else 0.0
    oc_disk_delta = get_dir_size_bytes(OC_STATE_DIR) - oc_start_disk
    hr_disk_delta = get_dir_size_bytes(HR_STATE_DIR) - hr_start_disk

    table = Table(title=f"Live Architecture Benchmark ({NUM_EVENTS} events)", show_header=True)
    table.add_column("Metric", style="bold")
    table.add_column("OpenClaw", justify="right")
    table.add_column("Hermes Agent", justify="right")
    table.add_row("RSS Memory Δ", f"{(oc_end_rss - oc_start_rss):.2f} MB", f"{(hr_end_rss - hr_start_rss):.2f} MB")
    table.add_row("Disk Usage Δ", f"{(oc_disk_delta / 1024):.2f} KB", f"{(hr_disk_delta / 1024):.2f} KB")
    table.add_row("Recall Latency", f"{oc_recall_ms:.2f} ms", f"{hr_recall_ms:.2f} ms")
    console.print("")
    console.print(table)

if __name__ == "__main__":
    run_benchmark()
Code language: Python (python)

Install the dependencies and run the script:

pip install psutil rich
python3 orchestrator.pyCode language: Bash (bash)

Step 3: Analyzing the Results

Running this benchmark outputs something like this:

Metric	OpenClaw	Hermes Agent
RSS Memory Δ	0.00 MB	-2.75 MB
Disk Usage Δ	213.41 KB	0.00 KB
Recall Latency	19593.32 ms	113.14 ms

The Truth Behind the Numbers

These results highlight a massive architectural divergence between the two agents regarding long-term memory.

1. The 19.6s OpenClaw Recall Latency
To retrieve a simple fact (“What happened with TICKET-0002?”) in an active session, OpenClaw takes almost 20 seconds. Why? Because OpenClaw relies entirely on appending messages to a JSONL log file. During recall, it has to feed the entire history back into the LLM’s context window. You are paying the latency (and token) cost of inference just to look up a deterministic fact.

2. The 113ms Hermes Recall Latency
Hermes recalled the same exact data in 113 milliseconds. Instead of dumping everything into the LLM context, Hermes compresses facts into an internal SQLite database (state.db) equipped with Full-Text Search (FTS). When asked about a specific ticket, Hermes bypasses the massive LLM context roundtrip entirely, executing a rapid local database query.

3. Disk Bloat vs. Compression
For just 20 events, OpenClaw’s local workspace ballooned by 213 KB because it perpetually appends raw textual logs. Hermes registered 0 KB of bloat. By leveraging SQLite’s Write-Ahead Logging (WAL), Hermes continuously compacts and structure its memory, avoiding unmanageable disk inflation over long-running sessions.

If you want to scale an AI agent in 2026, forcing the LLM to “remember” everything via massive context windows is not a viable strategy. It causes exponential degradation in response time and unnecessary token burn.

Hermes proves that structuring memory locally (via SQLite FTS/Vector DBs) and fetching context algorithmically before hitting the LLM is the only production-ready architecture.

Github Codes

You can download the codes on our Github repo, just copy and paste the .env.example files and fill properly with your credentials. If need help you can always reach out our team on Discord 🤙

Download the Code

FAQ

Why do we configure Hermes first?

Hermes already exposes a setup flow and the key operational commands we need for this article, including hermes setup, hermes model, hermes tools, and hermes config set, so it is the easier place to establish the shared Regolo backend before we mirror the same model choice in OpenClaw.

Can we migrate from OpenClaw to Hermes later?

Yes. Hermes documents hermes claw migrate, and its migration flow can import settings, memories, skills, command allowlists, messaging settings, and selected API keys from an OpenClaw installation.

Where does OpenClaw keep the local workspace for this test?

OpenClaw documents the default workspace root at ~/.openclaw/workspace, and user skills live under ~/.openclaw/workspace/skills/<skill>/SKILL.md, which is useful when we want the benchmark tool to be callable from a local skill during the comparison.

Is Regolo’s free trial enough for this benchmark?

For a lightweight local memory benchmark, it should be enough for most teams because Regolo’s pricing page currently advertises 30 free days, unlimited tokens on the pay-as-you-go plan, and no credit card requirement.

What should we publish with our results?

We should publish the machine spec, the exact model ID, the number of events, cold versus warm medians, and the raw results.json file. That gives everyone else enough context to reproduce the run and post their own numbers in Discord using the same structure.

🚀 Start your free 30-day trial at regolo.ai and deploy LLMs with complete privacy by design.

👉 Talk with our Engineers or Start your 30 days free →

Discord – Share your thoughts
GitHub Repo – Code of blog articles ready to start
Follow Us on X @regolo_ai
Open discussion on our Subreddit Community

Built with ❤️ by the Regolo team. Questions? regolo.ai/contact or chat with us on Discord

Share this article