HomeReadTactics deskBackend Optimization: Composite Indexes, Consistent Caching, and Batch Processing
Tactics·Jun 19, 2026

Backend Optimization: Composite Indexes, Consistent Caching, and Batch Processing

A founder details three backend optimization challenges—slow Postgres queries, inconsistent Redis cache keys, and large CSV ingestion—and the technical playbooks used to address them. A developer,…

A founder details three backend optimization challenges—slow Postgres queries, inconsistent Redis cache keys, and large CSV ingestion—and the technical playbooks used to address them.

A developer, writing under the handle "devto," detailed three backend optimization challenges encountered during an internship: a Postgres query taking 1.6 seconds against a million-row table, inconsistent Redis cache keys, and slow CSV ingestion. The founder reports these issues arose when an API transitioned from functional to production-scale usage.

Optimizing Postgres Queries with Composite Indexes

The initial API featured a million-row Postgres database with single-column indexes on gender, country_id, and age_group. Queries combining these filters, such as gender=male&country_id=NG&age_group=adult, reportedly took 1.6 seconds. The founder claims Postgres's default behavior for multi-column filters is to perform bitmap scans on individual indexes and then intersect results in memory. This process becomes a bottleneck at scale.

The solution involved adding composite indexes directly to the database schema. The founder implemented two such indexes: @@index([gender, country_id]) and @@index([gender, age_group, country_id]). The specific order of columns within the composite index was critical, with gender leading due to its higher selectivity. This approach allowed Postgres to traverse a single, pre-grouped data structure, bypassing the need for multiple scans and in-memory intersections. The post does not provide specific performance improvements after this change.

Normalizing Cache Keys for Consistency

A second problem involved Redis caching. The founder reports that functionally identical queries, like gender=male&country_id=NG and country_id=NG&gender=male, generated distinct cache keys. This resulted in redundant database calls, as the cache failed to recognize semantically equivalent requests.

The fix was to normalize all filter objects before generating a cache key. This multi-step process included lowercasing strings, alphabetically sorting object keys, coercing numeric values to consistent precision, and stripping undefined fields. Only after this normalization was the filter object serialized into a cache key. The founder provided a TypeScript snippet for a buildCacheKey function that calls a normalizeFilters utility. This ensures that any permutation or stylistic difference in a query's parameters maps to a single, consistent cache key, reducing cache misses.

Large CSV Ingestion: Identifying the Bottleneck

The third challenge involved processing large CSV files, specifically those up to 500,000 rows. The founder identified two primary issues. First, inserting 500,000 rows individually would result in 500,000 network round trips. At an estimated 5ms per trip, this would translate to approximately 40 minutes for a single file. Second, buffering an entire 50MB file in memory for a single bulk insert would create a 50MB heap spike, a problem compounded by concurrent uploads.

The post mentions that the solution for this problem involved "queue infrastructure behind a team product called Flowbrand" but does not detail the implementation or specific code for handling large CSV ingestion within the "What I built" section. The focus here was on identifying the core arithmetic and memory constraints of direct, synchronous processing.

What We'd Change

The tactics for database indexing and cache key normalization remain foundational, yet their application requires careful consideration. While composite indexes are effective, their optimal column order is highly dependent on specific query patterns and data distribution. An index that accelerates one query might slow another, or add overhead to writes. Modern database performance monitoring tools offer more granular insights into index usage and query plans, which can guide more precise index creation than a purely heuristic approach.

The cache key normalization strategy is robust but introduces a processing overhead on every request. For applications with extremely high query volumes or strict latency requirements, this preprocessing step could become a bottleneck. Alternative strategies, such as canonicalizing query parameters at the API gateway level or using a hashing algorithm on a sorted JSON representation, could offer similar benefits with different performance profiles. The founder's approach is sound for ensuring consistency but should be benchmarked against simpler or more performant alternatives depending on the application's specific needs.

The CSV ingestion problem, while clearly articulated, lacks a detailed solution in the provided account. Relying on a "queue infrastructure" is a common and effective pattern for asynchronous processing of large datasets. However, the specific implementation details—such as batch size, error handling, retry mechanisms, and worker scaling—are crucial. Without these specifics, the general advice to "use a queue" remains conceptual. Founders adopting this approach would need to design these components rigorously, considering factors like message broker choice (e.g., RabbitMQ, Kafka, SQS), worker concurrency limits, and data validation pipelines.

The challenges of scaling an API from functional proof-of-concept to production readiness are often less about new features and more about optimizing existing ones. The founder's experience underscores that performance bottlenecks frequently hide in assumptions about database behavior, cache consistency, or I/O operations. Addressing these issues requires a detailed understanding of underlying systems and a willingness to revise initial architectural choices, even when the initial implementation "worked."

The investor read

The technical challenges highlighted—database query optimization, caching efficiency, and large-scale data ingestion—are universal pain points in backend development. Solutions like composite indexing and cache key normalization represent mature, well-understood patterns. The shift towards asynchronous processing for heavy tasks, as implied by the "queue infrastructure" for CSV ingestion, signals a common architectural evolution for applications needing to handle high throughput without blocking user experience. For investors, the ability of a technical team to identify and systematically address these fundamental performance issues indicates operational maturity and a focus on long-term scalability, especially in markets where data volume and user concurrency are growth drivers. Products that streamline these optimizations, or abstract them away, continue to attract capital.

Pull quote: “The founder reports that functionally identical queries, like gender=male&country_id=NG and country_id=NG&gender=male, generated distinct cache keys.”

Sources · how we verified
  1. From Cache Keys to Concurrency

Every claim ties to a primary source. See our methodology.

Reported by the Maya desk on Founderr Pulse’s Tactics beat. Every factual claim is tied to a primary source and linked; anything that can’t be stood up doesn’t run. Founderr (RIKHATH LLC) is the accountable publisher and corrects in place. How we work · About · File a correction.
M
Maya

The Maya desk covers tactics: concrete playbooks, growth experiments, and operating decisions indie founders are running now. Every claim is sourced and linked. Operated by Founderr (RIKHATH LLC) See the desk →

Founderr Pulse — free & independent. The desk for people who build & back.