👉Try Qwen3-Reranker-4B on Regolo
Teams lose hours tuning retrieval because “top-k from vector search” is often relevant-ish, not truly useful—leading to noisy contexts, higher LLM costs, and inconsistent answers across languages and departments.
When your knowledge base is multilingual and full of near-duplicate chunks, recall alone is not enough: you need precision at the point where query and passage are judged together (not separately).
Run Qwen3-Reranker-4B in less than 10 minutes. Ready to scale, predictable pricing, and privacy-first architecture patterns you can apply from day one.
Outcome
- Reduce “wrong context sent to LLM” by reranking 20–50 candidates down to top-5 before generation (precision@k focus).
- Make costs predictable with a fixed €0.01 per rerank query (easy to forecast per request).
- Support multilingual search experiences (100+ languages) without maintaining separate reranking stacks per locale.
Prerequisites (fast)
- Regolo API key (store it as REGOLO_API_KEY).
- OpenAI-compatible integration mindset: same base URL + model selection approach many tools already support.
- Supported languages: 100+ (multilingual + cross-lingual retrieval).
Step-by-step (4–6 steps)
1) Set model + endpoint
Set the base endpoint and pick the model name.
BASE_URL = https://api.regolo.ai/v1
MODEL = Qwen3-Reranker-4BCode language: JavaScript (javascript)
Expected output: you’re ready to call a rerank API that takes query + documents + top_n and returns a ranked list with scores.
Step 2 — Quick rerank via HTTP (curl)
Use the classic schema: model, query, documents, top_n.
curl -X POST https://api.regolo.ai/v1/rerank \
-H "Authorization: Bearer $REGOLO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen3-Reranker-4B",
"query": "How do I turn on the lights?",
"documents": [
"Press the light switch on the wall.",
"Use your phone to control the smart lights.",
"Plug in the lamp and turn the knob."
],
"top_n": 3
}'Code language: Bash (bash)
Expected output (shape):
{
"results": [
{ "index": 0, "relevance_score": 0.92, "document": "..." },
{ "index": 1, "relevance_score": 0.61, "document": "..." },
{ "index": 2, "relevance_score": 0.14, "document": "..." }
]
}Code language: JSON / JSON with Comments (json)
Expected output: documents reordered by usefulness (not just similarity), with a score per passage.
3) Python drop-in (RAG post-retrieval)
Retrieve candidates (BM25/embeddings), then rerank before calling your LLM.
import os, requests
API_KEY = os.getenv("REGOLO_API_KEY")
BASE = "https://api.regolo.ai/v1"
def regolo_rerank(query, passages, top_n=5):
r = requests.post(
f"{BASE}/rerank",
headers={"Authorization": f"Bearer {API_KEY}"},
json={
"model": "Qwen3-Reranker-4B",
"query": query,
"documents": passages,
"top_n": top_n
},
timeout=30
)
r.raise_for_status()
data = r.json()
ranked = sorted(data["results"], key=lambda x: x["relevance_score"], reverse=True)
return [x["document"] for x in ranked[:top_n]]
# candidates = top-50 from your vector DB
# best = regolo_rerank(user_query, candidates, top_n=5)
# answer = llm(best)Code language: Python (python)
Expected output: a clean top-n context list that is typically far less redundant than raw retriever output (especially with many similar chunks).
Step 4 — Plug into LangChain / LlamaIndex / n8n
- LangChain: replace or augment a document compression/ranking step by calling /rerank and passing the reordered docs onward.
- LlamaIndex: implement a custom rerank function that sends query + nodes and reorders nodes by returned score.
- n8n/Flowise/Open WebUI: configure an OpenAI-compatible provider and add an HTTP node that hits /rerank, then feed its output into the chat flow.
Expected output: reranking becomes a single, isolated component you can turn on/off for A/B tests and quality measurement.
5) Best-practice defaults
- Start with retriever top-k = 20–50, rerank down to top-5/top-10.
- Keep passages ~200–500 tokens to reduce ambiguity and repeated facts.
- Measure precision@k / hit-rate@k and downstream answer exactness before/after reranking.
Expected output: predictable quality lift without changing the LLM, prompt, or index format first.
Production-ready (working code)
Below is a “ship-it” pattern: strict input controls + data minimization + observability hooks, so the reranker is deployable in regulated environments without rewriting later.
import os, time, hashlib, requests
API_KEY = os.environ["REGOLO_API_KEY"]
BASE = "https://api.regolo.ai/v1"
MODEL = "Qwen3-Reranker-4B"
def stable_request_id(query: str) -> str:
return hashlib.sha256(query.encode("utf-8")).hexdigest()[:16]
def rerank_production(query: str, docs: list[str], top_n: int = 5) -> dict:
t0 = time.time()
req_id = stable_request_id(query)
payload = {
"model": MODEL,
"query": query,
"documents": docs,
"top_n": top_n
}
resp = requests.post(
f"{BASE}/rerank",
headers={
"Authorization": f"Bearer {API_KEY}",
"Content-Type": "application/json",
"X-Request-Id": req_id
},
json=payload,
timeout=20
)
resp.raise_for_status()
out = resp.json()
metrics = {
"request_id": req_id,
"docs_in": len(docs),
"top_n": top_n,
"latency_ms": int((time.time() - t0) * 1000),
}
return {"output": out, "metrics": metrics}Code language: Python (python)
Expected output: JSON results + a metrics object you can ship to logs/APM (latency, volume, correlation id) while keeping the component stateless and easy to scale horizontally.
Benchmarks, costs & CTAs
Regolo exposes Qwen3-Reranker-4B at €0.01 per query, so cost scales linearly with rerank calls (not with tokens), which makes forecasting straightforward.
| Option | Pricing unit | Price | Latency reference | Notes |
| Regolo (Qwen3-Reranker-4B) | Per query | €0.01 / query | Measure in your region (export p50/p95) | Multilingual (100+), 32k context per model card. |
| Typical US alternative (Cohere Rerank 3.5) | Per search | $2.00 / 1,000 searches | ~171.5 ms (small) / ~459.2 ms (large) in a published benchmark | Pricing is per search, not per token; run a domain A/B for quality deltas. |
👉Try it on Regolo for free
Resources & Community
Official Documentation:
- Regolo Platform – European LLM provider, Zero Data-Retention and 100% Green
Related Guides:
Join the Community:
- Regolo Discord – Share your automation builds
- CheshireCat GitHub – Contribute plugins
- Follow Us on X @regolo_ai – Show your integrations!
- Open discussion on our Subreddit Community
🚀 Ready to Deploy?
Get Free Regolo Credits →
Built with ❤️ by the Regolo team. Questions? support@regolo.ai