HomeReadTactics deskLLM Batching: When Fewer API Calls Cost More
Tactics·Jun 18, 2026

LLM Batching: When Fewer API Calls Cost More

An LLM-powered translation pipeline saw costs rise when batching segments. The case demonstrates that prompt structure and token overhead dictate expenses more than API call count. The founder…

An LLM-powered translation pipeline saw costs rise when batching segments. The case demonstrates that prompt structure and token overhead dictate expenses more than API call count.

The founder ahikmah on dev.to details an attempt to reduce costs for an LLM-based document translation pipeline. Initial efforts to batch 160 text segments into fewer API calls resulted in a system that was both more expensive and slower than the baseline. The original "no batching" approach processed 160 segments with 160 API calls, consuming 14,287 input tokens and 2,506 output tokens for an estimated cost of $0.0024 over 30.4 seconds.

Initial Batching Increases Costs

The first optimization grouped up to 20 text segments into a single request using a keyed JSON format. This method aimed to reduce API calls but dramatically increased token usage. For the same 160 segments, this approach required 8 API calls, but input tokens surged to 33,529 and output tokens to 10,037. The author reports the estimated cost rose to $0.0084, and duration increased to 46.8 seconds. This outcome demonstrated that prompt structure significantly impacts token overhead and processing time, outweighing the benefit of fewer API calls.

Simplifying with a List of Strings

Recognizing the overhead of JSON keys, ahikmah next tried sending segments as a simple list of strings within the prompt. This reduced the structural overhead compared to keyed JSON. This method still grouped 20 segments per call, resulting in 8 API calls. Input tokens dropped to 21,398 and output tokens to 3,924. The author reports the estimated cost decreased to $0.0040, and duration improved to 39.8 seconds. While better than keyed JSON, it remained more expensive and slower than the original no-batching baseline.

Custom Separator Batching

The most effective strategy involved using a custom separator (e.g., ---SEGMENT---) between text segments within a single prompt. This minimized the non-content token overhead. This approach also grouped 20 segments per call, maintaining 8 API calls. Input tokens were further reduced to 14,929, and output tokens to 2,666. The author reports the estimated cost fell to $0.0028, and duration was 32.7 seconds. This method approached the baseline cost while still reducing API calls, indicating that token efficiency is paramount in LLM cost optimization.

Token Overhead Analysis

The core lesson from these experiments is that prompt engineering choices directly translate into token overhead, which in turn dictates cost and latency. Keyed JSON introduced significant non-content tokens for keys and structural elements. A list of strings reduced this, but still required formatting. The custom separator proved most efficient by adding minimal extra tokens while clearly delineating segments for the LLM. The author reports using gpt-4.1-nano for these tests.

What We'd Change

The ahikmah experiments highlight that reducing API call count is not a universal proxy for cost or latency optimization in LLM applications. The initial assumption that "fewer API calls should mean lower cost and faster processing" proved incorrect due to the hidden costs of prompt engineering. Future iterations of such a pipeline should prioritize explicit measurement of token overhead for every prompt structure change, not just API call counts.

The choice of gpt-4.1-nano is specific; its tokenization and pricing model might differ from more widely used models. A general playbook would need to validate these findings across different LLMs, such as OpenAI's gpt-3.5-turbo or gpt-4o, Anthropic's Claude, or open-source models. Different models have varying sensitivities to prompt length, structure, and instruction following, which could alter the optimal batching strategy. For instance, a model with a very high per-call cost but low token cost might favor larger batches even with some token overhead.

Furthermore, while batching reduces API calls, it can introduce other complexities like error handling for individual segments within a batch and managing context windows. If one segment causes a failure, the entire batch might need reprocessing, or a more sophisticated error recovery mechanism is required. For scenarios demanding strict latency guarantees or high reliability per segment, the baseline "one segment, one API call" might still be preferable despite higher overall API call counts, especially if the total cost difference is marginal.

Landing

The ahikmah case demonstrates that LLM cost optimization is not about simple heuristics like minimizing API calls. It is an empirical process demanding precise measurement of token consumption and latency across different prompt engineering strategies. The optimal approach balances the overhead of instruction and structure with the efficiency gained from batching, often favoring minimalist delimiters over complex data structures. This requires founders to benchmark every change, as intuitive optimizations can paradoxically increase costs.

The investor read

This signal underscores the critical role of prompt engineering and token efficiency in the unit economics of LLM-powered applications. For investors, it highlights that gross margin projections for AI products must account for highly variable API costs, which are sensitive to seemingly minor implementation details. Companies demonstrating sophisticated prompt optimization techniques, especially those with proprietary methods for minimizing token overhead, will possess a significant competitive advantage. This also suggests that the market for specialized, cost-optimized LLMs (like gpt-4.1-nano if it were a real product) or fine-tuned models will grow, as they offer a direct path to better margins compared to generic API calls.

Pull quote: “The author reports the estimated cost rose to $0.0084, and duration increased to 46.8 seconds.”

Sources · how we verified
  1. When Prompt Batching Made My LLM App More Expensive

Every claim ties to a primary source. See our methodology.

Reported by the Maya desk on Founderr Pulse’s Tactics beat. Every factual claim is tied to a primary source and linked; anything that can’t be stood up doesn’t run. Founderr (RIKHATH LLC) is the accountable publisher and corrects in place. How we work · About · File a correction.
M
Maya

The Maya desk covers tactics: concrete playbooks, growth experiments, and operating decisions indie founders are running now. Every claim is sourced and linked. Operated by Founderr (RIKHATH LLC) See the desk →

Founderr Pulse — free & independent. The desk for people who build & back.