Skip to content
Regolo Logo
Self‑Hosting & DevOps
5 min read

Why TurboQuant matters for real-world LLM inference

TurboQuant is a KV-cache compression method from Google Research that was presented at ICLR 2026. In the reported results, it compresses KV cache values…

Alex Genovese
Read article
Benchmarks & Cost Optimization
9 min read

Sustainable inference is now an AI infrastructure decision

Artificial intelligence is simultaneously our most promising tool for fighting climate change and one of its fastest-growing contributors. As AI adoption accelerates globally, the…

Alex Genovese
Read article
Tutorial & How‑to
6 min read

Using Regolo models with OpenCode

OpenCode is an open-source, terminal-native AI coding agent that supports 75+ LLM providers through an extensible configuration system. Because Regolo exposes a fully OpenAI-compatible…

Alex Genovese
Read article
Benchmarks & Cost Optimization
6 min read

Gemma 4 31B vs Qwen3.6 35B-A3B: When to use which

A benchmark-grounded guide for teams choosing between two of the strongest open models of 2026. What these two models actually are Gemma 4 31B is…

Alex Genovese
Read article
Ready to scale? Get Free Regolo Credits!