If we cannot say where prompts go, how long they stay around, and whether they leave the EU, we are not doing GDPR-compliant AI inference. We are just hoping the provider’s marketing page will save us later.
That is the practical problem this guide tries to solve.
Teams usually start in the wrong place. They ask which model looks best, which demo feels fastest, or which SDK is easiest to wire. Those questions matter, but they come second. The first question is simpler: what happens to the data when the application calls the model?
That answer has to be boringly clear.
What this guide covers
We are going to use a simple framework for reviewing an AI inference setup under GDPR. Not legal theater. Not vague claims. Just the checks that tell us whether the architecture is sane.
By the end, we should be able to answer:
- where inference runs;
- whether prompts and outputs are retained;
- whether personal data is transferred outside the EU;
- who controls persistence;
- when self-hosting makes sense and when it does not.
Why teams get this wrong
The usual failure mode is easy to spot. A team ships a useful AI feature first, then tries to reverse-engineer the compliance story later. That is how avoidable problems turn into procurement delays, security objections, and internal panic.
GDPR issues in AI inference usually come from a short list of things:
- data going to the wrong geography;
- unclear retention behavior;
- vague processor responsibilities;
- logs or analytics nobody thought about;
- product teams assuming the provider handles more than it actually does.
None of this is abstract. If the application sends prompts that contain names, emails, customer records, contracts, support data, or internal code comments tied to people, then the inference path is part of the data protection story.
The five checks that matter
This is the framework. If a provider or architecture fails one of these badly, the rest of the discussion gets much less interesting.
1. Where does inference actually run?
This is the first check because it kills a lot of weak options immediately.
If inference runs outside the EU, or if requests are routed through non-EU infrastructure for processing, backup, or analytics, the compliance conversation gets harder fast. Article 44 GDPR is where a lot of teams discover that “works fine” and “is easy to defend” are not the same thing.
For Regolo, the documented position is clear: inference runs on GPUs physically located inside the EU, mainly in Italy, and requests are not routed to non-EU cloud providers for processing.
That does not mean every upstream system in your stack is automatically compliant. It does mean the inference layer itself is easier to reason about.
2. Is any prompt or output retained at the inference layer?
This question is usually more important than teams want to admit.
If prompts or outputs are stored by default, then we need to understand retention time, storage location, deletion process, exposure surface, and whether any of that data feeds training or debugging systems.
Regolo’s privacy documentation makes a very specific claim here: prompts and outputs are processed in memory, discarded after the response, and not written to disk. No customer data is used for training or fine-tuning, and no data is shared with third parties.
That is the kind of answer we want. Specific. Auditable. Not “we take privacy seriously.”
3. Who controls persistence?
This is where a lot of internal confusion starts.
Even if the provider retains nothing, your application may still store everything. Prompts may land in app logs. Outputs may be copied into a database. Debugging tools may capture payloads. Support systems may expose data to the wrong people.
So the better question is not just “does the provider store anything?” It is “who, in the full system, is responsible for storage and retention?”
In a zero-retention inference setup, the provider’s role gets narrower. That is good. It means we control persistence in our own systems, where we can apply our own access rules, encryption, and deletion policies.
That is usually easier to defend than splitting responsibility across a messy stack of hidden defaults.
4. What logs or side channels still exist?
This is the check teams skip when they are in a hurry.
A provider can retain no prompts and still leave you exposed if the surrounding workflow is sloppy. API gateways, reverse proxies, app logs, tracing tools, error aggregators, and analytics events can all leak more than people expect.
Ask these questions directly:
- Are raw prompts logged anywhere on our side?
- Are responses copied into monitoring tools?
- Do error traces include sensitive payloads?
- Are developers pasting production prompts into tickets or chat?
If the answer is “probably” to any of those, the provider is not your main problem.
5. Is this architecture defensible for the actual use case?
This is where practicality beats ideology.
Not every AI feature needs the same level of control. An internal brainstorming bot is not the same thing as a support assistant touching customer records. A marketing helper is not the same thing as contract review or healthcare triage.
So the right question is: given the data category and the business risk, is this inference path good enough to defend in front of security, legal, procurement, and the person who will own the incident if something goes wrong?
That is the standard. Not theoretical perfection.
A practical comparison: EU provider, self-hosting, or default cloud API?
This is where people start arguing from ideology. Better to stay concrete.
EU-hosted inference provider
This is usually the cleanest middle ground when we want stronger privacy posture without taking on full infrastructure ownership.
It makes sense when:
- we want open models;
- we want EU data residency;
- we do not want to manage GPU infrastructure;
- we need a clearer processor story than the usual default APIs give us.
It makes less sense when:
- we need complete model-level control for unusual deployments;
- internal policy forces full self-hosting;
- the organization already has a mature GPU platform team.
Self-hosting
Self-hosting can be the right answer. It is not the adult answer by default.
It makes sense when we truly need maximum control, already know how to run the stack, and are willing to own scaling, updates, observability, and failure recovery.
It becomes a bad idea when a team chooses it mostly because it sounds safer, then quietly underinvests in operations. A badly run self-hosted stack is not automatically a compliance win.
Default US-hosted API path
Sometimes this is still the fastest route for low-risk use cases. Pretending otherwise is silly.
The problem is not that it never works. The problem is that many teams use it by default for workflows where they cannot clearly answer where the data goes, what is retained, and how cross-border transfer risk is handled.
That is where speed today turns into paperwork tomorrow.
How to review a provider in 15 minutes
If we need a fast internal screen before a longer review, this is enough:
- Confirm the inference geography.
- Confirm whether prompts and outputs are retained.
- Confirm whether customer data is used for training.
- Confirm whether persistence is your responsibility or shared.
- Confirm whether the API path fits the sensitivity of the actual workflow.
If any of those answers come back vague, treat that as a real signal, not a minor detail.
A minimal Python example
The code path itself is not the compliance story, but it helps to keep the integration simple. Regolo uses an OpenAI-compatible pattern, so the implementation is straightforward.
from openai import OpenAI
client = OpenAI(
api_key="YOUR_REGOLO_KEY",
base_url="https://api.regolo.ai/v1",
)
response = client.chat.completions.create(
model="MODEL_ID",
messages=[
{
"role": "user",
"content": "Summarize this support ticket in three bullet points."
}
],
)
print(response.choices[0].message.content)
Code language: JavaScript (javascript)
The code is simple on purpose. The harder part is making sure the system around it handles the data in a way we can explain without bluffing.
Common mistakes
The provider is GDPR-compliant, so we are covered.
No. The provider can reduce risk at the inference layer. Your application, logs, retention rules, and access controls still matter.
We are in Europe, so everything is fine.
Also no. EU geography helps, but it does not excuse sloppy logging, unclear access, or unnecessary persistence.
Self-hosting automatically solves the problem.
Only if the team can actually run it well. Otherwise it just moves the risk into a different room.
We will fix the compliance story later.
That is how teams end up rebuilding an architecture they should have reviewed before launch.
FAQ
Does GDPR-compliant inference mean zero risk?
No. It means the inference layer is easier to defend. Risk still depends on what the application sends, stores, and exposes.
Is zero data retention enough on its own?
No. It helps a lot, but it does not cover your app logs, your storage, or your internal access model.
When should we choose self-hosting instead?
Choose it when you truly need maximum control and have the operational maturity to carry it properly.
What is the simplest practical path for many teams?
An EU-hosted, zero-retention inference provider is often the cleanest middle ground when we want stronger privacy posture without running the whole stack ourselves.
If you want to review your own setup, start with the five checks above, then compare them against Regolo’s privacy documentation, API docs, and pricing.
Start your free 30-day trial at regolo.ai and deploy LLMs with complete privacy by design.
👉 Talk with our Engineers or Start your 30 days free ->
- Privacy and compliance docs – Read the platform documentation
- API docs – Review endpoints and authentication
- Pricing – Evaluate the platform for your team
- GitHub Repo – Open source projects and integrations around Regolo
- Follow Us on X @regolo_ai
Built with ❤️ by the Regolo team. Questions? regolo.ai/contact or chat with us on Discord