HomeReadTactics deskA founder reports an 80% LLM cost reduction using model routing
Tactics·Jul 2, 2026

A founder reports an 80% LLM cost reduction using model routing

Dhruv Kapadia of Coworker shares a three-step playbook for routing LLM requests to cheaper models, claiming a 50x price spread between 'good enough' and frontier APIs. Dhruv Kapadia of Coworker…

Dhruv Kapadia of Coworker shares a three-step playbook for routing LLM requests to cheaper models, claiming a 50x price spread between 'good enough' and frontier APIs.

Dhruv Kapadia of Coworker reports cutting his company's LLM bill by approximately 80%. The method was not a novel architecture or a new model, but a routing system that sends each API call to the cheapest model capable of handling the task. This approach treats LLM providers as a commodity market, arbitraging the significant price differences between models.

Exploit the 50x price spread

The core insight is the price gap between frontier models and their less-capable peers. Kapadia claims the spread for comparable output quality can be as high as 50x per token. Most applications pay a premium for a frontier model on every single task, from complex legal analysis to simple sentiment classification. You are paying frontier prices for work a cheaper model finishes fine. The savings come from acknowledging that a large portion of production traffic does not require state-of-the-art reasoning.

Implement a three-step routing pattern

The logic Kapadia describes is a simple classification and fallback system.

  1. Classify: Analyze each incoming request to determine its intent and complexity. This initial step decides how much is at stake if the model produces a suboptimal response.
  2. Select: Route the request to the cheapest model that meets the quality bar for that specific task class.
  3. Fallback: If the chosen model fails a validation check or returns a low confidence score, the request is automatically escalated to a more powerful, more expensive model.

Kapadia provides a hypothetical example. For a workflow of one million monthly requests, routing 70% of them to a mid-tier model dropped the blended cost by 80% compared to an all-frontier baseline. The numbers are illustrative but point to the leverage available when only 30% of traffic requires the most expensive option.

Avoid the implementation gotchas

This is not a zero-cost abstraction. Kapadia flags four critical components for a production-ready routing system.

  • Evaluation Harness: A system to measure output quality for each task class is non-negotiable. Without it, a team cannot know if routing to a cheaper model degrades the user experience.
  • Reliable Fallbacks: The escalation path must be robust. The rate of escalations serves as a key metric for tuning routing thresholds.
  • Latency Monitoring: Cost is not the only variable. Cheaper models can sometimes be faster, but not always. Both metrics must be tracked.
  • High-Stakes Exclusion: The most critical tasks, such as those involving legal or medical information, should be excluded from routing. These always go to the best available model, accepting the cost as a requirement.

Kapadia notes that his own company, Coworker, sells an LLM gateway that productizes this logic.

WHAT WE'D CHANGE

The playbook is technically sound but its practicality depends entirely on team size and traffic volume. The author's suggestion that this is a "reasonable weekend prototype" understates the engineering cost of maintaining it in production. An evaluation harness and robust fallback logic are not trivial systems to build and operate.

For a solo founder or a small team with moderate LLM usage, the cost of this engineering overhead could easily exceed the savings on API calls. The build-versus-buy calculation has shifted since early AI adoption. While building a router offers maximum control, using a managed LLM gateway (like the one the author's company provides, or competitors like Portkey or OpenRouter) is often the more capital-efficient choice. It outsources the maintenance of model integrations, latency tracking, and complex fallback rules.

The claimed 80% savings figure also requires a specific traffic profile. It is achievable only if a large majority of a product's LLM calls are for low-complexity tasks. A product specializing in nuanced, high-stakes generation might find that 90% of its traffic still requires a frontier model, reducing potential savings to a much lower figure. The playbook is most effective for applications with a wide distribution of task difficulty.

LANDING

The durable lesson is not the specific routing implementation, but the strategic shift it represents. It moves LLM consumption from a monolithic technical expense to a managed portfolio of costs. The price-performance curve across models is a structural feature of the AI market, not a temporary anomaly. As new models are released, the spread between the cutting edge and the "good enough" commodity layer will likely persist. Founders who actively manage this spread will build more resilient businesses than those who simply pipe all traffic to the most expensive API and hope for the best.

The investor read

This tactic signals a maturation in the AI application layer. Early-stage focus was on capability; now it is on unit economics. Companies providing cost-optimization tools, like the author's, represent a 'picks and shovels' play on the AI market, betting on the continued expense of foundation models. An investable company in this space must demonstrate more than simple routing. Differentiators include sophisticated evaluation frameworks, dynamic model selection based on real-time performance, and deep integrations into observability platforms. For founders building AI-native products, a clear strategy for managing LLM costs is becoming table stakes for diligence. An 80% cost reduction directly impacts gross margins and valuation.

Pull quote: “You are paying frontier prices for work a cheaper model finishes fine.”

Sources · how we verified
  1. Cutting our LLM bill ~80% with model routing: the actual cost math

Every claim ties to a primary source. See our methodology.

Reported by the Maya desk on Founderr Pulse’s Tactics beat. Every factual claim is tied to a primary source and linked; anything that can’t be stood up doesn’t run. Founderr (RIKHATH LLC) is the accountable publisher and corrects in place. How we work · About · File a correction.
M
Maya

The Maya desk covers tactics: concrete playbooks, growth experiments, and operating decisions indie founders are running now. Every claim is sourced and linked. Operated by Founderr (RIKHATH LLC) See the desk →

Founderr Pulse — free & independent. The desk for people who build & back.