Skip to content
Regolo Logo

Programmatic Tool Calling: How to Build Smarter LLM Agents on Regolo GPUs

Programmatic tool calling is quickly becoming one of the hottest topics among LLM builders on Reddit, X and dev forums. Developers are moving beyond classic “JSON tool calls” toward agents that can write small programs, orchestrate tools, and execute complex workflows safely and efficiently.​

If stalling tool calls, hallucinated arguments, and slow, expensive chains sound familiar, you are not alone. Threads with hundreds of upvotes complain that tool calling is “hard business”: models call the wrong tool, over-fetch data, hit rate limits, or simply fabricate results when tools fail. At the same time, new approaches like Anthropic’s Programmatic Tool Calling (PTC) and “code-mode” tools are gaining attention because they cut token usage and improve reliability by letting the model generate code instead of brittle JSON blobs.​

What is programmatic tool calling?

Programmatic tool calling is a pattern where the LLM writes and runs small programs to orchestrate tools, instead of just emitting one-off JSON function calls.​

In traditional tool calling (OpenAI-style function calling, Claude tools, etc.), the model selects a tool and returns JSON arguments that your backend executes. Programmatic tool calling adds an extra layer: the model generates a mini-program (in Python, a DSL like PTC-Lisp, or another sandboxed language) that can:

  • Call multiple tools in sequence or conditionally
  • Loop, filter, aggregate and transform data
  • Handle partial failures and retries deterministically

Developers on Reddit and X are excited about this because it lets the model express richer logic in fewer tokens and reduces the need to stuff thousands of raw records into the context window.

Why is everyone talking about it now?

Across Reddit communities like r/LLMDevs, r/LangChain and r/ClaudeAI, several “hot themes” emerge around tool calling:

  • Reliability: models often choose the wrong tool or hallucinate parameters; people share long debugging threads on prompt tweaks and schema design.
  • Latency and cost: multiple back‑and‑forth tool calls lead to slow UX and high GPU bills.
  • Context bloat: developers push huge datasets into the context instead of using tools properly for filtering and aggregation.
  • Orchestration complexity: people chain tools in LangChain, custom frameworks, or bespoke orchestrators, and it quickly becomes spaghetti.​

Programmatic tool calling is seen as a way to address these pain points: instead of asking the model to micromanage every tool call step, you let it produce a compact program that runs in a sandbox, hits your tools efficiently, and only returns summarized results to the LLM.​

This is also why there is so much energy around new libraries implementing PTC patterns in various languages and frameworks.

Classic tool calling vs programmatic tool calling

Here is a high‑level comparison of the two approaches:

AspectClassic tool callingProgrammatic tool calling
Model outputJSON function callShort program (Python, DSL, etc.)
OrchestrationMostly handled by LLM + client loopMostly handled by generated program in sandbox
Workflow depthOne or few tools per turnArbitrary sequences, loops, branching
Token usageHigher for many tool calls and large resultsLower, as tools compute and summarize server-side
ReliabilityProne to hallucinated argumentsMore deterministic, typed signatures and retries possible
ComplexitySimpler to start, harder at scaleMore setup initially, cleaner for complex agents

On Regolo, you can run both patterns on the same GPU-backed LLMs; the difference is mostly at the application layer.

Core architecture on Regolo GPUs

To make these ideas concrete, imagine a backend running on Regolo’s LLM as a Service. You have:

  • A hosted model (e.g. a 70B+ instruction-tuned LLM) on NVIDIA H100/A100/L40S GPUs, exposed via HTTP API.
  • A set of tools in your app: database queries, internal APIs, search engines, etc.
  • A small “tool runtime” that executes either:
    • single tool calls (classic), or
    • generated mini-programs (programmatic).

Your goal is to:

  1. Keep GPU utilization high (batching, KV caching) to reduce cost.
  2. Minimize tokens by having tools do heavy data processing outside the LLM.
  3. Maintain strict data privacy and zero data retention, which Regolo supports by design with European, privacy-first infrastructure.

In practice, this means designing your app so that the LLM does “high-level reasoning”, while your tools and program runtime do “heavy lifting”. Regolo’s GPU clusters handle the LLM inference efficiently; your code handles tool execution in your VPC or backend.


Example 1: Classic JSON tool calling on Regolo

Let’s start with a simple, OpenAI-style pattern and then evolve it into programmatic tool calling.

Assume Regolo exposes a chat completion endpoint similar in spirit to other major APIs, with support for tool (function) descriptions. (Adapt the base URL and auth headers to your actual Regolo endpoint.)

import os
import requests
import json

REGOLO_API_KEY = os.environ["REGOLO_API_KEY"]
REGOLO_BASE_URL = "https://api.regolo.ai/v1/chat/completions"

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_user_orders",
            "description": "Fetch last N orders for a given user id",
            "parameters": {
                "type": "object",
                "properties": {
                    "user_id": {"type": "string"},
                    "limit": {"type": "integer", "minimum": 1, "maximum": 50}
                },
                "required": ["user_id"]
            }
        }
    }
]

def call_regolo(messages, tools=None, tool_choice="auto"):
    payload = {
        "model": "regolo-llm-large",
        "messages": messages,
    }
    if tools is not None:
        payload["tools"] = tools
        payload["tool_choice"] = tool_choice

    resp = requests.post(
        REGOLO_BASE_URL,
        headers={
            "Authorization": f"Bearer {REGOLO_API_KEY}",
            "Content-Type": "application/json",
        },
        data=json.dumps(payload),
        timeout=30,
    )
    resp.raise_for_status()
    return resp.json()

def get_user_orders(user_id: str, limit: int = 10):
    # Your actual business logic here (DB query, microservice call, etc.)
    return [
        {"id": "order_123", "total": 42.5},
        {"id": "order_456", "total": 19.9},
    ][:limit]

def chat_with_tools(user_message: str):
    messages = [
        {"role": "system", "content": "You are a helpful support assistant."},
        {"role": "user", "content": user_message},
    ]

    first = call_regolo(messages, tools=tools, tool_choice="auto")
    msg = first["choices"][0]["message"]

    if "tool_calls" not in msg:
        return msg["content"]

    tool_results = []
    for tool_call in msg["tool_calls"]:
        name = tool_call["function"]["name"]
        args = json.loads(tool_call["function"]["arguments"])

        if name == "get_user_orders":
            result = get_user_orders(**args)
        else:
            result = {"error": "unknown tool"}

        tool_results.append({
            "tool_call_id": tool_call["id"],
            "role": "tool",
            "name": name,
            "content": json.dumps(result)
        })

    messages.append(msg)
    messages.extend(tool_results)

    second = call_regolo(messages)
    return second["choices"][0]["message"]["content"]

if __name__ == "__main__":
    print(chat_with_tools("Show me my last 2 orders. My id is user_789."))
Code language: Python (python)

This pattern mirrors what many developers already use and what X/Reddit threads often refer to as the “two‑step loop”: model proposes a tool call, client executes, then model finalizes. It works, but as workflows grow more complex, the orchestration logic in chat_with_tools becomes harder to maintain.


Example 2: Moving to programmatic tool calling

Now let’s let the model write a mini-program describing how to use tools, instead of a single JSON call. Inspired by recent PTC implementations, we can define a tiny DSL where the model returns something like:

(seq
  (define orders (get_user_orders "user_789" 20))
  (define filtered (filter_by_min_total orders 30))
  (summarize_orders filtered))
Code language: JavaScript (javascript)

To keep things simple in Python, we will instead ask the model to return a restricted subset of Python using only whitelisted functions. Do not run arbitrary Python—always sandbox and validate—this example is intentionally minimal for clarity.

Step 1: Define a “tool runtime”

import textwrap

SAFE_GLOBALS = {}

TOOLS_RUNTIME = {
    "get_user_orders": get_user_orders,
    "filter_by_min_total": lambda orders, min_total: [
        o for o in orders if o["total"] >= min_total
    ],
    "summarize_orders": lambda orders: {
        "count": len(orders),
        "total_amount": sum(o["total"] for o in orders)
    },
}

def execute_program(program_source: str, tools_runtime: dict):
    local_env = {}

    safe_globals = {"__builtins__": {}}
    safe_globals.update(tools_runtime)

    program_source = textwrap.dedent(program_source)

    exec(program_source, safe_globals, local_env)

    if "result" not in local_env:
        raise ValueError("Program must set a `result` variable.")
    return local_env["result"]
Code language: Python (python)

The rule for the model will be: “write a Python script that ends with a variable named result containing the final output”. The script may call only the tools we expose in TOOLS_RUNTIME.

Step 2: Ask the LLM for a program, not a final answer

PROGRAMMING_PROMPT = """
You are an AI that writes short Python programs to solve tasks by calling tools.

Available functions:
- get_user_orders(user_id: str, limit: int) -> list[dict]
- filter_by_min_total(orders: list[dict], min_total: float) -> list[dict]
- summarize_orders(orders: list[dict]) -> dict

Rules:
- Do NOT print anything.
- Do NOT import modules.
- Use only the functions listed above and basic Python control flow.
- Always assign the final answer to a variable named `result`.
"""

def plan_with_program(user_message: str):
    messages = [
        {"role": "system", "content": PROGRAMMING_PROMPT},
        {"role": "user", "content": user_message},
    ]

    resp = call_regolo(messages)
    program_source = resp["choices"][0]["message"]["content"]
    return program_source

def run_agent_with_program(user_message: str):
    program_source = plan_with_program(
        f"Write a program that solves this request: {user_message}"
    )
    result = execute_program(program_source, TOOLS_RUNTIME)

    explanation_messages = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": f"User request: {user_message}"},
        {"role": "user", "content": f"Tool results: {json.dumps(result)}"},
    ]
    resp = call_regolo(explanation_messages)
    return resp["choices"][0]["message"]["content"]
Code language: Python (python)

Now the agent does three steps:

  1. Ask the LLM for a program.
  2. Execute that program locally against tools, without extra LLM calls.
  3. Ask the LLM to express the final result to the user.

Developers on Reddit, X and other communities highlight that this pattern dramatically reduces the number of LLM calls and token usage because all looping and filtering happens in the program, not in natural language. Running this on Regolo means your GPU budget is spent on reasoning and summarization, not on blindly streaming large datasets back and forth.​


Example 3: Multi-step workflow with sub-agents

Some of the most interesting community demos combine programmatic tool calling with sub-agents: smaller specialists that the main agent can invoke.​

Here is a simple pattern:

def search_knowledge_base(query: str):
    # Search your KB (e.g. Elasticsearch / vector DB) and return docs
    return [{"title": "Refund policy", "content": "..."}, ...]

def escalate_to_human(ticket_id: str, summary: str):
    # Create escalation in your ticketing system
    return {"status": "created", "ticket_id": ticket_id}

AGENT_TOOLS = {
    "get_user_orders": get_user_orders,
    "filter_by_min_total": TOOLS_RUNTIME["filter_by_min_total"],
    "summarize_orders": TOOLS_RUNTIME["summarize_orders"],
    "search_knowledge_base": search_knowledge_base,
    "escalate_to_human": escalate_to_human,
}

PROGRAMMING_PROMPT_MULTI = """
You are an AI that writes short Python programs to solve customer support tasks.

Available functions:
- get_user_orders(user_id: str, limit: int)
- filter_by_min_total(orders, min_total: float)
- summarize_orders(orders)
- search_knowledge_base(query: str)
- escalate_to_human(ticket_id: str, summary: str)

Rules:
- Use the tools as needed.
- You may branch based on tool results.
- Always set a final `result` dict with keys:
  - 'answer': str (what to say to the user)
  - 'actions': list (descriptions of any actions you took)
"""

def execute_support_program(program_source: str):
    return execute_program(program_source, AGENT_TOOLS)Code language: Python (python)

With the right prompt, the LLM can generate programs that first search your knowledge base, then only call escalate_to_human if the confidence is low or a policy requires escalation. All of this logic is captured in a small script, while Regolo provides the GPU horsepower to interpret the user request and generate the program and final explanation.


Getting started with Regolo for programmatic tool calling

To implement these patterns in your own stack:

  1. Choose a Regolo model that fits your use case (generalist vs code-heavy, context length, latency).
  2. Implement a thin client in your preferred language to call the Regolo chat/completion API.
  3. Start with classic tool calling to validate your tools and schemas.
  4. Introduce a program-writing step (like plan_with_program) with a restricted runtime and sandboxed execution.
  5. Add error feedback and retries so the LLM can iteratively improve its programs.
  6. Monitor latency, token usage and GPU costs; adjust prompts to keep programs short and tool-heavy.

FAQ

What is the main advantage of programmatic tool calling over classic function calling?

The main advantage is that the LLM expresses complex workflows as short programs that can call multiple tools, loop and branch, instead of juggling many separate tool calls. This reduces token usage, increases reliability, and makes orchestration easier to reason about.​

Is it safe to let an LLM write and run code?

It can be safe if you strictly sandbox execution: limit the language surface area, restrict imports, whitelist functions, enforce timeouts and memory limits, and treat errors as feedback to the LLM rather than system crashes. This is exactly the approach taken by emerging PTC runtimes.

Do I need a special model to use programmatic tool calling?

No, you can typically use any strong instruction-following LLM that can follow your “write code in this format” instructions. Some providers add explicit PTC support, but the pattern can be implemented on top of generic models as shown in this article.​

How does this affect GPU cost?

By reducing the number of LLM turns and shrinking prompts, programmatic tool calling lowers total token throughput per task, which translates into fewer GPU-seconds per workflow. High-throughput inference techniques like batching and KV caching on Regolo further amplify these savings.

Can I combine PTC with RAG (Retrieval-Augmented Generation)?

Yes. A common pattern is to expose retrieval as a tool (e.g. search_knowledge_base) and let the program decide when and how to call it, possibly multiple times, before summarizing results for the user. This keeps raw documents outside the LLM context and lets your retrieval layer stay independent.

Does Regolo store my prompts or tool results?

Regolo is designed as a Zero Data Retention, Data Privacy First platform with compute and data residency in Europe, which means requests are processed for inference and not kept for training, aligning well with privacy-sensitive use cases like internal tools and enterprise agents.


Github Codes

You can download the codes on our Github repo, just copy and paste the .env.example files and fill properly with your credentials. If need help you can always reach out our team on Discord 🤙


🚀 Ready? Start your free trial on today



Built with ❤️ by the Regolo team. Questions? regolo.ai/contact or chat with us on Discord