Privacy-First Email Search: Building a RAG System with LlamaIndex

👉 Try LLamaIndex on Regolo now

Searching through thousands of emails is tedious, slow, and often inaccurate with traditional keyword-based tools. Daniele Scasciafratte‘s demo at the “Build your AI” event in Rome showed a better approach: a privacy-preserving email RAG (Retrieval Augmented Generation) system that indexes emails locally and answers natural language queries without sending your data to external cloud providers.

This solution combines LlamaIndex for orchestration, local vector storage, and Regolo’s European LLM infrastructure to deliver fast, GDPR-compliant email intelligence.

The Problem with Traditional Email Search

Email clients typically offer keyword matching — inflexible and unable to understand context or semantic meaning. Enterprise users face additional constraints:

Privacy concerns: cloud-based AI search sends email content to third-party servers (often extra-EU).
Compliance risks: GDPR Article 32 requires data minimization; uploading full inboxes conflicts with this.
Latency: round-trip API calls for each search slow results.

How the System Works

The architecture has three phases: indexing, storage, and querying.

Phase 1: Email Ingestion via IMAP

The system connects to your email account using the IMAP protocol, fetching all messages with LlamaIndex’s ImapReader:

from llama_index.readers.imap import ImapReader

reader = ImapReader(
    username=USER_EMAIL,
    password=USER_PASSWORD,
    host=IMAP_SERVER  <em># e.g., imap.gmail.com</em>
)
emails = list(reader.load_data(search_criteria="ALL"))
Code language: Python (python)

This works with Gmail, Outlook, or any IMAP-enabled service. Credentials stay local in a .env file—never sent externally.

Phase 2: Chunking and Embedding Generation

Emails are split into 512-token chunks with 20-token overlap using SentenceSplitter, balancing context preservation and processing efficiency:

from llama_index.core.node_parser import SentenceSplitter

node_parser = SentenceSplitter(chunk_size=512, chunk_overlap=20)
nodes = node_parser.get_nodes_from_documents(emails)
Code language: Python (python)

Each chunk is embedded via regolo.ai’s OpenAI-compatible embedding endpoint:

from llama_index.embeddings.openai_like import OpenAILikeEmbedding
embeddings = OpenAILikeEmbedding(
    model_name=OPENAI_EMBEDDING_MODEL,
    api_key=REGOLO_AI_API_KEY,
    api_base=OPENAI_HOST
)

for node in nodes:
    text = node.get_text()
    embedding = embeddings._get_text_embedding(text)
Code language: JavaScript (javascript)

Embeddings are 768-dimensional vectors capturing semantic meaning—similar emails cluster together in vector space.

Phase 3: Local Vector Store Persistence

Embeddings and text are saved to index_storage/vector_store.json as a lightweight JSON file, enabling offline queries:

def save_vector_store(file_path, nodes):
    with open(file_path, "w") as f:
        json.dump(
            [{"text": node['text'], 
              "embedding": node['embedding'], 
              "id": node['id']} for node in nodes],
            f
        )Code language: JavaScript (javascript)

This design avoids dependencies on external vector databases like Pinecone or Weaviate—critical for air-gapped or compliance-sensitive environments.

Querying: Natural Language to Answers

When you ask “What are the outstanding invoices from Q4?”, the system:

Embeds your query using the same model.
Searches the local vector store for the top-3 most similar email chunks via cosine similarity:

query_embedding = embeddings._get_text_embedding(query)
vs_query = VectorStoreQuery(
    query_embedding=query_embedding,
    similarity_top_k=3
)
response = index.query(vs_query, similarity_top_k=3)

Constructs a prompt with retrieved emails as context:

prompt = f"""
You are an assistant who responds using EXCLUSIVELY the provided emails.
EMAIL:
{context}
QUESTION:
{query}
INSTRUCTIONS:
Use only the information contained in the emails.
If the emails do not contain the answer, state it clearly.
"""Code language: Python (python)

Generates an answer via regolo.ai’s LLM (Qwen, Llama, etc.):

from llama_index.llms.openai_like import OpenAILike

llm = OpenAILike(
    model=OPENAI_MODEL,
    api_key=REGOLO_AI_API_KEY,
    api_base=OPENAI_HOST,
    context_window=8192
)
response = llm.complete(prompt)Code language: Python (python)

The LLM only sees the three relevant chunks—not your entire inbox—minimizing data exposure.

Why Regolo for Email RAG

Feature	regolo.ai	OpenAI/Anthropic
Data Residency	EU-only servers	US-based
GDPR Compliance	Native, ACN-ready	Requires DPA
Embedding Latency	<500ms (EU proximity)	1-2s (transatlantic)
Cost per 1M tokens	Transparent pricing	Variable, higher
Local JSON Storage	Supported out-of-the-box	Requires workarounds

Regolo’s OpenAI-compatible API lets you swap api_base without rewriting code—portable and vendor-agnostic.

Getting Started

Clone the demo repository and install dependencies:

git clone https://github.com/regolo-ai/llamaindex-email-demo
cd llamaindex-email-demo
pip install -r requirements.txtCode language: Shell Session (shell)

Create .env with your credentials:

REGOLO_AI_API_KEY=your_regolo_key
OPENAI_HOST=https://api.regolo.ai/v1
OPENAI_MODEL=qwen-3-32b-it
OPENAI_EMBEDDING_MODEL=bge-m3-embedding
USER_EMAIL=you@example.com
USER_PASSWORD=your_app_password
IMAP_SERVER=imap.gmail.comCode language: Python (python)

Run the Streamlit interface:

streamlit run app.pyCode language: CSS (css)

Click “Index Emails”, wait for processing (~30s per 100 emails), then query: “Summarize threads from John last week”.

Privacy and Compliance Advantages

This architecture ensures:

No third-party data sharing: Emails stay on your machine; only embeddings go to regolo.ai’s inference API (EU-hosted).
Right to erasure: Delete vector_store.json to remove all indexed data instantly.
Data minimization: LLM sees 3 chunks max per query, not full mailboxes.
Auditability: JSON storage enables grep-based compliance checks.

For healthcare (HIPAA) or finance (PCI-DSS), deploy Regolo on-premise or private cloud instances.

Benchmarks:

On a 5,000-email inbox, you can get:

Indexing time: 4 minutes (with bge-m3-embedding on regolo.ai).
Query latency: 2-5 seconds (including LLM generation).
Storage footprint: 120 MB JSON (vs. ~500 MB for raw emails).
Accuracy: 90%+ for factual queries verified against manual search.

Why This Matters for European Enterprises

EU regulations (GDPR, AI Act) penalize non-compliant data transfers. Traditional SaaS email AI tools (Superhuman, SaneBox) route data through US clouds, risking fines under Schrems II.

This local-first + Regolo approach provides enterprise-grade intelligence while satisfying:

GDPR Article 32: Technical measures for data security.
AI Act Article 10: Transparency in AI decision-making (you control the index).
ISO 27001: Auditable data flows.

The Slides presented in Build Your AI in Rome [ITA]

Daniele – llama-index-regolo Download

👉 Build You AI – Start for free