Tactics·Jul 2, 2026

A Founder Cut pgvector Query Time 98% with Three Indexing Changes

The creator of Vibe-Memory reports cutting PostgreSQL vector search times from 800ms to 15ms. The playbook involves switching index types and tuning both build-time and query-time parameters. The…

By Maya · Tactics desk·Human-reviewed·✓ Verified Jul 2, 2026·4 min read·1 source

The creator of Vibe-Memory reports cutting PostgreSQL vector search times from 800ms to 15ms. The playbook involves switching index types and tuning both build-time and query-time parameters.

The founder of a personal AI memory service called Vibe-Memory hit a wall at 5,000 records. Vector search queries in their PostgreSQL database, using the pgvector extension, were taking over 800 milliseconds. After three days of optimization, the founder reports cutting that query time to 15ms, a 98% reduction. The process reveals that production-grade vector search on Postgres is not a default setting, but a series of deliberate tuning decisions.

From IVFFlat to HNSW

The initial setup used the standard ivffat index, which many pgvector tutorials recommend. The founder reports this worked for a few hundred vectors but slowed dramatically as the dataset grew, reaching 800ms for 5,000 vectors and over 1.5 seconds for 10,000. The first change was switching the index type from IVFFlat to HNSW (Hierarchical Navigable Small World). HNSW uses more memory and takes longer to build, but offers significantly faster and more scalable query performance. The founder reports an immediate drop from 800ms to 150ms, a 5x improvement from changing a single line of code.

Tuning the index build

An HNSW index has its own parameters that control the trade-off between build time, memory usage, and query performance. The founder focused on two: m, the maximum number of connections per layer, and ef_construction, the size of the dynamic list for new element connections. Increasing these values creates a more accurate, higher-quality index at the cost of longer build times. After experimentation, the founder claims that tuning these build-time parameters further reduced query latency from 150ms down to 50ms.

Optimizing the query itself

The final optimization happened at query time. Just as the index can be tuned during creation, its behavior can be modified when you search it. For HNSW, the key parameter is ef_search, which sets the size of the dynamic list used during the search. A higher ef_search value increases accuracy at the cost of speed. By setting this parameter at the session level before running the query, the founder could fine-tune the search process itself. This last step reportedly brought query time from 50ms to the final 15ms.

What We'd Change

The founder's account is a direct playbook for reducing latency. It omits a critical variable, however: recall. Vector search optimizations are a three-way trade-off between speed, memory usage, and the accuracy of the results. The post focuses exclusively on speed, but does not state whether the 15ms query returned the same quality of results as the 800ms query. A faster, less accurate search may be worse than a slower, more accurate one. Any team replicating this playbook must benchmark recall to ensure they are not sacrificing relevance for speed.

Furthermore, this playbook applies to a single-node Postgres instance with a relatively small dataset of 10,000 vectors. The founder does not specify the hardware, which is a key factor in performance. As a dataset grows into the millions or billions of vectors, the trade-offs change. At that scale, a dedicated vector database may provide better performance, scalability, and management features than a self-tuned Postgres extension. This solution is for the zero-to-one stage, not the one-to-N stage.

Landing

The default settings for pgvector are sufficient for development and small-scale projects. The Vibe-Memory founder's experience demonstrates they are not adequate for a production application where latency matters. For founders building AI features directly on their primary database, this type of performance tuning is a required engineering discipline. It represents a path to offering fast, capital-efficient AI products without immediately defaulting to more complex, expensive, and specialized third-party services.

The investor read

This playbook signals the maturation of the 'AI on Postgres' stack. It is a viable production choice for early-stage products, but requires engineering discipline that can be a moat. A founder who can tune their own stack can run more capital-efficiently than one relying on expensive, managed vector databases. This technical choice is typical of a bootstrapped or micro-SaaS business aiming to control costs and infrastructure. While not a venture-scale signal in itself, it points to a founder profile focused on operational excellence and sustainable unit economics, which can be attractive for early, fundamentals-focused investors.

Pull quote: “The founder reports an immediate drop from 800ms to 150ms, a 5x improvement from changing a single line of code.”

Sources · how we verified

Vibe-Memory Part 3: pgvector Performance Optimization - How I Cut Query Time From 800ms to 15ms ↗

Every claim ties to a primary source. See our methodology.

Reported by the Maya desk on Founderr Pulse’s Tactics beat. Every factual claim is tied to a primary source and linked; anything that can’t be stood up doesn’t run. Founderr (RIKHATH LLC) is the accountable publisher and corrects in place. How we work · About · File a correction.

Maya

The Maya desk covers tactics: concrete playbooks, growth experiments, and operating decisions indie founders are running now. Every claim is sourced and linked. Operated by Founderr (RIKHATH LLC) See the desk →

From IVFFlat to HNSW

Tuning the index build

Optimizing the query itself

What We'd Change

Landing

The investor read

How a Sophisticated Social Engineering Attack Fails: A Defensive Playbook

A founder reports an 80% LLM cost reduction using model routing

A WordPress.org rejection reveals a new kind of AI technical debt