# Supercharge RAG with Qwen3 and LlamaIndex on Regolo

👉[Try Qwen3 on Regolo for free](https://dashboard.regolo.ai)

Building **Retrieval-Augmented Generation (RAG)** often feels like assembling a puzzle with mismatched pieces. Standard OpenAI pipelines are easy but expensive and opaque with your data. **Open-source models (**like Llama 3 or Qwen) promise control but require painful self-hosting or complex GPU management, leading to high operational overhead and slow "time-to-first-token".
Teams need a middle ground: **the simplicity of the OpenAI SDK, the power of state-of-the-art open models like Qwen3, and zero data retention for compliance.**

**Deploy a private, high-performance RAG pipeline with Qwen3-8B and LlamaIndex in less than 10 minutes. Zero infra management, 100% GDPR-aligned.**

## **Outcome**

- **Unified Intelligence:** Qwen3-8B (LLM) and Qwen3-Embedding-8B (Embeddings) are designed to work together, delivering superior retrieval relevance compared to mixing disparate vendors.
- **Drop-in Compatibility:** **[Regolo](https://regolo.ai/)** exposes these models via an OpenAI-compatible endpoint, so you can switch your LlamaIndex **OpenAI** class to **OpenAILike** without rewriting your logic.
- **Full Observability:** See exactly how the model "thinks" (reasoning traces) and strip them for users—giving you debugging power without confusing your customers.

## **Prerequisites (Fast)**

- [**Regolo API Key**](https://dashboard.regolo.ai/): From your dashboard.
- **Python 3.10+**: And a virtual environment.
- **LlamaIndex**: pip install llama-index llama-index-llms-openai-like llama-index-embeddings-openai-like

## **Step-by-Step (Code Blocks)**

### **1) Configure the API Connection**

Point LlamaIndex to Regolo's endpoint. This "plugs in" the infrastructure.

```
import os
from dotenv import load_dotenv

load_dotenv()
REGOLO_API_KEY = os.getenv("REGOLO_API_KEY")
REGOLO_ENDPOINT = "https://api.regolo.ai/v1"Code language: Python (python)
```

### **2) Initialize Qwen3 Models (LLM &amp; Embed)**

Define the "Brain" and the "Indexer". Note the **is\_function\_calling\_model=False** flag to keep it focused on pure retrieval.

```
from llama_index.llms.openai_like import OpenAILike
from llama_index.embeddings.openai_like import OpenAILikeEmbedding
from llama_index.core import Settings

# The Brain (LLM)
llm = OpenAILike(
    model="Qwen3-8B",
    api_base=REGOLO_ENDPOINT,
    api_key=REGOLO_API_KEY,
    context_window=8192,
    is_chat_model=True
)

# The Indexer (Embedding)
embed_model = OpenAILikeEmbedding(
    model_name="Qwen3-Embedding-8B",
    api_base=REGOLO_ENDPOINT,
    api_key=REGOLO_API_KEY
)

# Set as global defaults
Settings.llm = llm
Settings.embed_model = embed_modelCode language: Python (python)
```

With this short definition, we’ve essentially told LlamaIndex which model to talk to whenever it needs to generate an answer. From here on, Qwen3-8B becomes the “brain” of our pipeline, ready to process context and generate human-like responses.

### **3) Load &amp; Index Data**

Read your documents and turn them into a searchable vector index in one pass.

```
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex

# Load local docs
documents = SimpleDirectoryReader("./data").load_data()

# Build index (Regolo handles the embedding generation)
index = VectorStoreIndex.from_documents(documents)Code language: Python (python)
```

Expected output: A **VectorStoreIndex** object ready for querying.

### **4) Query the Engine**

Create a query engine and ask questions.

```
query_engine = index.as_query_engine(similarity_top_k=3)
response = query_engine.query("What are the key safety features mentioned?")
print(str(response))Code language: Python (python)
```

Expected output: A natural language answer derived *only* from your documents.

### **5) Production Polish (Cleaning Reasoning)**

Qwen3-8B often outputs &lt;think&gt; tags. Use this helper to clean them for the end-user.

```
import re

def clean_response(text: str) -> str:
    # Removes the internal <think>...</think> block
    return re.sub(r"<think>.*?</think>", "", text, flags=re.DOTALL)

clean_answer = clean_response(str(response))
print(f"User-facing Answer:\n{clean_answer}")Code language: Python (python)
```

## **Production-Ready: Full Pipeline**

To run this in production, wrap the cleaning logic into a post-processor or a simple API wrapper function. The code above is robust enough for a microservice: just expose the **query\_engine.query** call via FastAPI or Flask.

## **Benchmarks &amp; Costs**

| **Feature** | **Regolo (Qwen3-8B)** | **Standard OpenAI (GPT-4o-mini)** |
|---|---|---|
| **Data Privacy** | **Zero Retention**.​ | Standard retention policies. |
| **Reasoning** | **Transparent**. View &lt;think&gt; traces.​ | Opaque / Hidden. |
| **Cost** | **Pay-per-token**. (~$0.03/1M input). | ~$0.15/1M input. |
| **Embeddings** | **Qwen3-Embedding**. High multilingual performance. | text-embedding-3-small. |

👉[Try Qwen3 on Regolo for free](https://dashboard.regolo.ai)

## **Resources &amp; Community**

**Official Documentation:**

- [Regolo Platform](https://regolo.ai) - European LLM provider, Zero Data-Retention and 100% Green

**Related Guides:**

- [Boost Your Workflows with Regolo AI on n8n](https://regolo.ai/boost-your-workflows-with-regolo-ai-on-n8n/)
- [Build Multi-Agent Workflows with crewAI Teams](https://regolo.ai/build-multi-agent-workflows-with-crewai-teams/)

**Join the Community:**

- [Regolo Discord](https://discord.gg/ZzZvuR2y) - Share your automation builds
- [CheshireCat GitHub](https://github.com/cheshire-cat-ai) - Contribute plugins
- Follow Us on X[ @regolo\_ai](https://x.com/regolo_ai) - Show your integrations!
- Open discussion on our[ Subreddit Community](https://www.reddit.com/r/regolo_ai/)

---

## **🚀 Ready to Deploy?**

[**Get Free Regolo Credits →**](https://dashboard.regolo.ai)

---

> *Built with ❤️ by the Regolo team. Questions? support@regolo.ai*