HomeReadTactics deskHow a developer built a zero-cost RAG assistant in one weekend
Tactics·Jul 5, 2026

How a developer built a zero-cost RAG assistant in one weekend

A French developer detailed a complete, open-source stack for building an internal document chatbot. The playbook relies on free tiers, but the real cost is in production scaling and security. An…

A French developer detailed a complete, open-source stack for building an internal document chatbot. The playbook relies on free tiers, but the real cost is in production scaling and security.

An employee asks a simple question about company policy. They spend five minutes searching emails and shared drives before giving up and interrupting an HR manager. A developer writing under the pseudonym 'sharklandy' claims to have solved this problem for small and medium-sized enterprises (SMEs) by building a Retrieval-Augmented Generation (RAG) assistant in a single weekend, at a reported cost of zero euros.

The result is a publicly accessible demo, rag-pme.vercel.app, that allows users to upload documents and ask questions. The developer provided a full technical breakdown, offering a playbook for building a functional prototype with freely available tools. The core claims of timeline and cost are self-reported and not independently verifiable.

The zero-cost stack

The project avoids paid services by combining local models with the generous free tiers of modern infrastructure providers. The stack is fully detailed in the developer's post.

  • Local AI Models: Ollama runs embedding and large language models locally for development. The specific embedding model cited is nomic-embed-text, which converts text chunks into 768-dimension vectors.
  • Vector Database: A free-tier Supabase instance provides a PostgreSQL database with the pgvector extension for storing and querying document vectors.
  • Inference API: Groq's API is used for fast language model inference in production, operating within its free tier.
  • Frontend: The interface is a standard React and Vite combination, styled with Tailwind CSS. For the live demo, Transformers.js handles embeddings directly in the browser.
  • Deployment: The application is hosted on Vercel's free hobby plan.

A two-pipeline architecture

The developer correctly separates the RAG process into two distinct pipelines: ingestion and querying. This separation is fundamental to how these systems operate.

The ingestion pipeline runs once per document. It extracts raw text, splits it into 900-character chunks with a 150-character overlap to preserve context, and generates a vector for each chunk. Crucially, each chunk is stored with its metadata: the source filename and page number. These vectors and their metadata are then stored in Supabase.

The query pipeline executes for each user question. The user's question is converted into a vector using the same model. pgvector then performs a similarity search to find the top three most relevant text chunks from the document base. These chunks, along with their source metadata, are combined with the original question into a prompt for the Groq API, which generates the final answer.

Metadata is the product

The most critical tactical insight is the disciplined handling of metadata. The developer states that the key to the entire system is ensuring metadata travels with the text from initial processing to final display. This allows the system to cite its sources, showing the user the exact page of the document used to generate an answer.

Without citations, there is no trust. This approach directly addresses the primary enterprise objection to LLMs by making the system's reasoning transparent and verifiable. The final interface displays the source document and similarity score as clickable elements, grounding the AI's output in verifiable fact.

WHAT WE'D CHANGE

A weekend project that replicates previously complex enterprise software is a powerful demonstration of commoditized AI infrastructure. However, the 'zero-cost' framework has sharp limitations that a founder must address before building a business on it.

First, free tiers are for prototypes, not production. Supabase's free plan has storage and compute limits. Groq's free API has rate limits that a multi-user business application would quickly exceed. Vercel's hobby plan is not intended for commercial use and has execution limits. The first customer would likely push this architecture past its breaking point. The real cost is not zero; it is merely deferred.

Second, the security model is insufficient for enterprise use. Sending internal company documents, which can include sensitive HR, legal, or financial information, to a third-party API like Groq is a non-starter for most businesses. A production version would require either a fully self-hosted model or a vendor with robust data privacy agreements and a SOC 2 certification. The current architecture prioritizes speed and cost over security.

Finally, the playbook omits the significant, ongoing cost of maintenance. Document parsers fail on malformed PDFs. The underlying language models will be updated, potentially requiring a full re-embedding of all documents to maintain performance. These operational realities turn a 'zero-cost' project into a recurring time and engineering investment.

LANDING

The 'zero-cost RAG' playbook is an effective map for building a proof-of-concept. It allows a solo founder to validate an idea with real users in days, not months. But it is not a map for building a durable business. The path from a functional demo to a secure, scalable, and reliable product is where the actual costs are incurred and the real engineering challenges begin. Founders who mistake the starting line for the finish line will build impressive demos that cannot become viable companies.

The investor read

This playbook signals the near-complete commoditization of basic RAG functionality. What required significant engineering effort 18 months ago is now a weekend project using open-source tools and free service tiers. This dramatically lowers the barrier to entry, but also collapses the moat for any company whose sole value proposition is 'a chatbot for your documents.' An investable company in this space will not compete on the core tech. Instead, it will build a defensible business around a specific vertical (e.g., legal, compliance, biotech) where it can solve high-value, domain-specific problems of document ingestion, security, and workflow integration. The moat is no longer the algorithm, but the trust, compliance, and data handling required by a specific industry.

Pull quote: “Without citations, there is no trust.”

Sources · how we verified
  1. J'ai construit un assistant documentaire pour PME en un week-end — à coût zéro

Every claim ties to a primary source. See our methodology.

Reported by the Maya desk on Founderr Pulse’s Tactics beat. Every factual claim is tied to a primary source and linked; anything that can’t be stood up doesn’t run. Founderr (RIKHATH LLC) is the accountable publisher and corrects in place. How we work · About · File a correction.
M
Maya

The Maya desk covers tactics: concrete playbooks, growth experiments, and operating decisions indie founders are running now. Every claim is sourced and linked. Operated by Founderr (RIKHATH LLC) See the desk →

Founderr Pulse — free & independent. The desk for people who build & back.