HomeReadTactics deskRoute LLM Tasks by Cost and Complexity, Cut Token Bills by Two-Thirds
Tactics·May 21, 2026

Route LLM Tasks by Cost and Complexity, Cut Token Bills by Two-Thirds

A founder's observation reveals how defaulting to expensive LLMs for all agent tasks inflates costs. Implementing task-specific model routing can significantly reduce token expenditures. A Reddit…

A founder's observation reveals how defaulting to expensive LLMs for all agent tasks inflates costs. Implementing task-specific model routing can significantly reduce token expenditures.

A Reddit user on r/openclaw reported reducing token costs by two-thirds after abandoning a common AI agent architecture. The founder, Lars Winstead, identified this as a widespread error: defaulting to a single, expensive large language model (LLM) for all tasks. This practice, he argues, is not a strategy but an oversight in billing, applicable across OpenClaw, n8n, Make, and Zapier.

Routing models by task, not brand

Lars Winstead, writing on dev.to, observed a recurring setup mistake in OpenClaw: users configure one premium LLM as the default for every operation. This includes routine tasks such as heartbeat checks, cron pings, inbox triage, and low-stakes tagging. Winstead characterizes this as an "expensive default," not a clever agent architecture. The core issue is a lack of differentiation in model application based on task complexity or criticality.

Reddit user reduces token costs by two-thirds

The inefficiency of this "one model for all" approach became clear through community feedback. Winstead cited a Reddit user on r/openclaw who explicitly stated, "Stop using opus for everything. seriously. i was running it on heartbeat checks and cron pings which is just lighting money on fire. glm-5.1 handles all that stuff fine. i only use sonnet 4.6 now when the task actually needs reasoning and my token costs are like a third of what they were." This direct experience highlights the financial impact of indiscriminate model usage. The user's shift to a cheaper model for simple tasks resulted in a two-thirds reduction in token costs.

Matching model to task complexity

The fundamental takeaway is to route models based on the specific task requirements rather than brand loyalty or a global application setting. Winstead contends that real agent systems perform diverse functions. These include cheap classification, data extraction, tagging, summarization, retries, memory maintenance, occasional hard reasoning, and high-risk decisions. Each of these task types demands a different model profile. Boring, low-stakes jobs should use inexpensive models, while complex reasoning tasks warrant expensive models. Dangerous jobs, involving critical decisions, require models that have passed specific evaluations. This approach, termed "model routing," is presented as basic engineering for workflows running continuously.

OpenClaw configuration supports task-specific models

The OpenClaw framework itself provides structural cues for this model routing strategy. Its configuration schema allows for defining a primary model alongside ordered fallbacks. It also permits the specification of distinct models for image processing, PDF handling, and image generation. Winstead argues this is not accidental; it is the product's design guiding users toward task-specific model selection. The YAML snippet provided illustrates this:

agents:
  defaults:
    model:
      primary: openai/gpt-5.4-mini
      fallbacks:
        - anthropic/claude-sonnet-4.6
        - google/gemini-3.5-flash
    imageModel: google/gemini-3.5-flash
    pdfModel: openai/gpt-5.4
    imageGenerationModel: openai/gpt-image-1

This configuration demonstrates how different models can be assigned to various default tasks and media types, reinforcing the concept of a differentiated model strategy.

Avoiding expensive models for simple automation

The tendency to over-allocate frontier models to trivial tasks stems from a perception that "agent work" is inherently sophisticated. A heartbeat check, when performed by an AI agent, might feel important. An inbox review triggered by a cron job, leveraging AI, can seem advanced. However, much of recurring automation involves basic operations: classifying data, summarizing content, comparing notes, tagging tickets, or determining if a state has changed. These tasks are well-suited for smaller, cheaper models. Reserving premium models for tasks that genuinely require advanced reasoning is the core principle of cost-effective agent architecture.

WHAT WE'D CHANGE

The core principle of model routing by task remains valid, but its implementation requires additional considerations beyond the initial setup. The source focuses on OpenClaw's configuration, yet the strategy extends to n8n, Make, Zapier, and custom Python workers. For these platforms, the "routing" mechanism might involve conditional logic within workflows rather than a declarative YAML structure. This introduces a potential increase in workflow design complexity and maintenance overhead.

The advice to use models that "pass your evals" for dangerous jobs is critical but underspecified. A robust model routing strategy necessitates a defined evaluation framework for each task type. This includes establishing performance benchmarks for cheaper models on simple tasks and rigorous testing for expensive models on complex or high-risk decisions. Without clear evaluation criteria and continuous monitoring, the cost savings from cheaper models could be offset by increased error rates or the need for human intervention. Furthermore, the specific models cited (e.g., opus, sonnet 4.6, glm-5.1, gpt-5.4-mini) are subject to rapid change. A durable playbook would emphasize the characteristics of models (e.g., cost per token, context window, reasoning capability) rather than specific vendor names, allowing founders to adapt as the LLM landscape evolves.

LANDING

The practice of assigning LLMs based on task complexity is not an advanced optimization; it is a fundamental engineering requirement for any agent system operating at scale. As AI agents become integral to routine operations, the default choice of a single, premium model represents an unexamined cost center. Differentiating between simple classification and complex reasoning, and then matching the appropriate model, transforms a billing oversight into a deliberate, cost-efficient architecture. This approach moves beyond basic setup to establish a resilient and economically sound foundation for automated workflows.

Pull quote: “Stop using opus for everything. seriously. i was running it on heartbeat checks and cron pings which is just lighting money on fire. glm-5.1 handles all that stuff fine. i only use sonnet 4.6 now when the task actually needs reasoning and my token costs are like a third of what they were”

Sources · how we verified
  1. I kept seeing the same OpenClaw mistake: one expensive model for every job

Every claim ties to a primary source. See our methodology.

Reported by the Maya desk on Founderr Pulse’s Tactics beat. Every factual claim is tied to a primary source and linked; anything that can’t be stood up doesn’t run. Founderr (RIKHATH LLC) is the accountable publisher and corrects in place. How we work · About · File a correction.
M
Maya

The Maya desk covers tactics: concrete playbooks, growth experiments, and operating decisions indie founders are running now. Every claim is sourced and linked. Operated by Founderr (RIKHATH LLC) See the desk →

Founderr Pulse — free & independent. The desk for people who build & back.