Rerank models have landed on Regolo! 🚀

Francesco Massa∙20 November 2025

With Qwen3-Reranker-4B, you can give your RAG a critical brain: take the retriever’s results, re-evaluate them with a cross-encoder trained to understand queries and documents together, and get sharper, more relevant answers. Multilingual (100+ languages), long-context capable, and with a predictable cost €0.01 per query. Perfect for multilingual FAQs, enterprise knowledge bases, and internal search engines where precision matters more than recall.

1) Introduction

Imagine having an assistant that doesn’t just find the right pages it ranks them by usefulness for your question. That’s what a reranker does: it takes 10–50 candidates, scores them with a cross-encoder that reads query and passage together, and returns a truly relevant top-k.
With Qwen3-Reranker-4B on Regolo, you get one of the best price-performance ratios available today: strong multilingual support, long context windows (Qwen3 series up to 32k), and a family that includes 0.6B / 8B versions for different needs.
On Regolo, the new entry is Qwen3-Reranker-4B (€0.01 per query).

2) What is Qwen3-Reranker-4B (and why use it)

Qwen3-Reranker-4B is part of the Qwen3 family, designed for text ranking and retrieval.
Given a query and a set of passages, it assigns a relevance score to each and reorders them. The Qwen3 Embedding/Reranker family comes in 0.6B, 4B, and 8B sizes, with strong multilingual (100+) support and state-of-the-art results on ranking/retrieval benchmarks.
In plain English: fewer false positives, less noise, and answers that hit the mark on the first try.

When do you really need a reranker?
– RAG pipelines with many similar chunks 🧩
– Help centers or QA portals in multiple languages 🌍
– E-discovery, knowledge mining, internal search engines 🔎
– Long snippets prioritization, without blowing up the LLM context window 🧠

3) Quick guide: using Qwen3-Reranker-4B on Regolo

Regolo exposes OpenAI-compatible APIs same logic for keys, models, and endpoints so you can integrate it right away with your favorite frameworks (LangChain, LlamaIndex, Flowise, n8n, Open WebUI, etc.).
Base endpoint: https://api.regolo.ai/v1

Prerequisites

Create an API Key on Regolo.
Set your Base URL to https://api.regolo.ai/v1 and select the model:
"model": "Qwen3-Reranker-4B" (model name as shown in Regolo Models).

Quick note on cost: €0.01 per query.
Each call takes one query and a list of documents, returning the ranked list. Easy to estimate ahead of time.

A) HTTP Call (classic “query + documents” schema)

Here’s a typical payload for reranking (query + list of passages + top_k).
Regolo’s API is OpenAI-compatible; many integrations use a standardized schema for rerank within LLM utils on our Swagger.
Main fields: model, query, documents, top_n.

curl -X POST https://api.regolo.ai/v1/rerank \
  -H "Authorization: Bearer $REGOLO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen3-Reranker-4B",
    "query": "How do I turn on the lights?",
    "documents": [
      "Press the light switch on the wall.",
      "Use your phone to control the smart lights.",
      "Plug in the lamp and turn the knob."
    ],
    "top_n": 3
  }'
Code language: PHP (php)

The response will include the documents reordered and usually a relevance score.
Use top_n to trim the noise before sending the results to your LLM.

API Reference: Regolo Swagger (LLM utils → Rerank section) and “OpenAI-compatible” setup guide.

B) Python drop-in for your RAG pipeline

A minimal example: first, retrieve candidates using your retriever (BM25 or embeddings), then reorder them with Regolo + Qwen3-Reranker-4B.

import os, requests

API_KEY = os.getenv("REGOLO_API_KEY")
BASE = "https://api.regolo.ai/v1"

def regolo_rerank(query, passages, top_n=5):
    resp = requests.post(
        f"{BASE}/rerank",
        headers={"Authorization": f"Bearer {API_KEY}"},
        json={
            "model": "Qwen3-Reranker-4B",
            "query": query,
            "documents": passages,
            "top_n": top_n
        },
        timeout=30
    )
    resp.raise_for_status()
    data = resp.json()
    ranked = sorted(
        data["results"],
        key=lambda x: x["relevance_score"],
        reverse=True
    )
    return [r["document"] for r in ranked[:top_n]]

# Example use
candidates = load_candidates_from_vector_db(query)   # e.g., Milvus/FAISS
best = regolo_rerank("EU data privacy policy", candidates, top_n=5)
answer = llm_answer(best)  # pass top-5 to your favorite LLM
print(answer)
Code language: PHP (php)

C) Plug-and-play with popular tools

LangChain → add a post-processing step that calls Regolo’s /rerank endpoint and replaces your DocumentCompressor with the model’s ranked output.
LlamaIndex → define a custom rerank_fn that sends query + nodes to Regolo, replacing your SimilarityPostprocessor.
Open WebUI / Flowise / n8n → configure as OpenAI-Compatible with Base URL https://api.regolo.ai/v1, add a custom “HTTP request” node for reranking, and feed its output into your chatbot flow.

Best practices for crisp results

Retriever top-k: 20–50 candidates is a sweet spot; let the reranker do the heavy lifting.
Sensible chunking: 200–500-token passages reduce ambiguity and redundancy.
True multilingual mode: if your index contains mixed languages, let the reranker operate in the query language Qwen3 handles cross-lingual cases beautifully.
Measure improvements: track hit rate@k and answer exactness before and after reranking it’s often the simplest lever to boost quality without changing your LLM.

How “big” is the model?

Regolo currently offers Qwen3-Reranker-4B.
The Qwen3 family also includes an 8B variant if you need maximum accuracy (at higher cost and latency).
For most enterprise setups, 4B is the sweet spot.

With this addition, your RAG stops sniffing randomly and starts selecting with intent.
Next fun step: benchmark it on your domain and measure how much your precision@k improves once you insert the reranker between retriever and LLM.

Tutorial