Tactics·May 24, 2026

Building Production Enterprise RAG: A Detailed Architecture Playbook

This playbook details a production-ready Enterprise RAG system, emphasizing access control, traceable citations, and a clear path from local development to Azure deployment with specific…

By Maya · Tactics desk·Human-reviewed·✓ Verified May 24, 2026·4 min read·1 source

This playbook details a production-ready Enterprise RAG system, emphasizing access control, traceable citations, and a clear path from local development to Azure deployment with specific infrastructure.

An enterprise RAG system, documented in a practitioner's build log, was developed to a specific standard: access control enforced before retrieval scoring, every answer including traceable citations, and evaluation measuring restricted document leakage. This approach outlines a path from local development to Azure production, emphasizing operational readiness and security for internal document handling.

Document Ingestion and Retrieval

The system supports Markdown document ingestion, incorporating front-matter for role metadata. This pipeline uses SQLite for both metadata and chunk storage, tracking document and chunk counts. Lexical retrieval is implemented with token cosine similarity scoring. The architecture also includes a configuration-selectable Azure AI Search retrieval adapter, allowing for a seamless transition to cloud-based indexing.

Query, Access Control, and API

Role-based candidate filtering is applied before retrieval scoring, a critical security measure. Answer generation is citation-backed, operating in a deterministic mock mode for development. The system logs RBAC_blocked_count per query, providing visibility into how many chunks were filtered by access control. Role derivation relies on the X-API-Key header, explicitly preventing request-body role elevation. The core is a FastAPI query API (POST /query) with health probes at GET /health.

Local user registration (POST /auth/register) with role assignment is available, alongside API key creation, listing, and revocation (POST /api-keys, POST /api-keys/{id}/revoke). Raw API keys are never persisted after creation; instead, SHA-256 key hashes are stored. Management endpoints are protected by an ADMIN_TOKEN.

Evaluation and Operational Metrics

A dedicated evaluation runner (POST /eval/run) directly calls the live query pipeline, avoiding mocked paths for realistic assessment. It tracks four key metrics per run: pass rate, restricted leakage count, citation coverage, and average latency. Per-case results provide granular detail, including expected versus retrieved document IDs and pass/fail indicators.

Operational controls encompass an audit log for all administrative actions (GET /audit-logs) and a query log detailing citations, role, latency, and RBAC metrics. A Prometheus-style operational metrics endpoint is exposed. Security headers are enabled by default, CORS is explicitly configured, and structured JSON request logging (JSON_LOGS=true) is supported. In-memory rate limiting per client (RATE_LIMIT_PER_MINUTE) is also implemented.

Azure Production Infrastructure

The local runtime directly maps to a specified Azure deployment topology. This includes Microsoft Entra ID for employee authentication, Azure Container Apps hosting both the API and dashboard, Azure AI Search for retrieval, and Azure OpenAI for answer generation. Data persistence relies on Azure PostgreSQL or Cosmos DB for metadata and audit logs, with Azure Blob Storage handling source documents. Azure Key Vault manages secrets, and Application Insights provides comprehensive logs and metrics.

Critically, switching from local to Azure requires only environment variable changes, with no code modifications. The SQLAlchemy layer abstracts database differences, eliminating schema migrations between SQLite and PostgreSQL. The system is designed to "fail fast" if required AZURE_* settings are missing, preventing silent degradation.

WHAT WE'D CHANGE

The described architecture provides a robust foundation, yet certain aspects warrant consideration for broader enterprise adoption or future scaling beyond an initial proof-of-concept. The reliance on SQLite for metadata and chunk storage, while entirely suitable for local development and initial stages, presents a scaling bottleneck for high-volume production environments. While the post explicitly mentions Azure PostgreSQL or Cosmos DB for production, the initial local setup's simplicity might lead some teams to underestimate the migration complexity if not planned and tested early. The SQLAlchemy layer mitigates schema changes, but performance tuning and operational management of a distributed database introduce new challenges.

The detailed API key management system, including local user registration and revocation, duplicates functionality often handled by existing enterprise identity providers (IdPs) such as Okta, Auth0, or Azure AD. Integrating directly with an enterprise's established IdP would reduce operational overhead, enhance security posture by centralizing identity management, and streamline user provisioning. This integration would shift the burden of user lifecycle management and authentication from the RAG system to the organization's existing security infrastructure.

Furthermore, while the evaluation metrics are robust, particularly the focus on "restricted leakage count," the post does not detail how the evaluation set itself is generated or maintained. For true production readiness, especially with a dynamic document corpus, the process of creating and updating this evaluation set is critical and often resource-intensive. Without a defined strategy for evaluation set management, the risk of evaluation drift—where the test set no longer accurately reflects real-world usage or new data—increases. This aspect requires a dedicated pipeline and governance to ensure continuous relevance and accuracy of performance metrics.

This RAG system blueprint offers a detailed, production-oriented approach, emphasizing access control and measurable performance. Its modular design, with clear separation between local development and cloud deployment, provides a structured path for teams building internal AI applications. The focus on traceable citations and leakage prevention addresses critical enterprise requirements for trust and data security.

Pull quote: “Role-based candidate filtering is applied before retrieval scoring, a critical security measure.”

Sources · how we verified

What Enterprise RAG Is Ready For Today and What Production Deployment Actually Requires ↗

Every claim ties to a primary source. See our methodology.

Reported by the Maya desk on Founderr Pulse’s Tactics beat. Every factual claim is tied to a primary source and linked; anything that can’t be stood up doesn’t run. Founderr (RIKHATH LLC) is the accountable publisher and corrects in place. How we work · About · File a correction.

Maya

The Maya desk covers tactics: concrete playbooks, growth experiments, and operating decisions indie founders are running now. Every claim is sourced and linked. Operated by Founderr (RIKHATH LLC) See the desk →

Document Ingestion and Retrieval

Query, Access Control, and API

Evaluation and Operational Metrics

Azure Production Infrastructure

WHAT WE'D CHANGE

Developer details Iceberg partition overwrite for atomic data corrections in pipelines

Developer traces inconsistent AI output to floating-point rounding noise

Engineer details config-driven pipeline for unifying CSVs via EAV model