Building Production Enterprise RAG: A Detailed Architecture Playbook
This playbook details a production-ready Enterprise RAG system, emphasizing access control, traceable citations, and a clear path from local development to Azure deployment with specific…
This playbook details a production-ready Enterprise RAG system, emphasizing access control, traceable citations, and a clear path from local development to Azure deployment with specific infrastructure.
An enterprise RAG system, documented in a practitioner's build log, was developed to a specific standard: access control enforced before retrieval scoring, every answer including traceable citations, and evaluation measuring restricted document leakage. This approach outlines a path from local development to Azure production, emphasizing operational readiness and security for internal document handling.
Document Ingestion and Retrieval
The system supports Markdown document ingestion, incorporating front-matter for role metadata. This pipeline uses SQLite for both metadata and chunk storage, tracking document and chunk counts. Lexical retrieval is implemented with token cosine similarity scoring. The architecture also includes a configuration-selectable Azure AI Search retrieval adapter, allowing for a seamless transition to cloud-based indexing.
Query, Access Control, and API
Role-based candidate filtering is applied before retrieval scoring, a critical security measure. Answer generation is citation-backed, operating in a deterministic mock mode for development. The system logs RBAC_blocked_count per query, providing visibility into how many chunks were filtered by access control. Role derivation relies on the X-API-Key header, explicitly preventing request-body role elevation. The core is a FastAPI query API (POST /query) with health probes at GET /health.
Local user registration (POST /auth/register) with role assignment is available, alongside API key creation, listing, and revocation (POST /api-keys, POST /api-keys/{id}/revoke). Raw API keys are never persisted after creation; instead, SHA-256 key hashes are stored. Management endpoints are protected by an ADMIN_TOKEN.
Evaluation and Operational Metrics
A dedicated evaluation runner (POST /eval/run) directly calls the live query pipeline, avoiding mocked paths for realistic assessment. It tracks four key metrics per run: pass rate, restricted leakage count, citation coverage, and average latency. Per-case results provide granular detail, including expected versus retrieved document IDs and pass/fail indicators.
Operational controls encompass an audit log for all administrative actions (GET /audit-logs) and a query log detailing citations, role, latency, and RBAC metrics. A Prometheus-style operational metrics endpoint is exposed. Security headers are enabled by default, CORS is explicitly configured, and structured JSON request logging (JSON_LOGS=true) is supported. In-memory rate limiting per client (RATE_LIMIT_PER_MINUTE) is also implemented.
Azure Production Infrastructure
The local runtime directly maps to a specified Azure deployment topology. This includes Microsoft Entra ID for employee authentication, Azure Container Apps hosting both the API and dashboard, Azure AI Search for retrieval, and Azure OpenAI for answer generation. Data persistence relies on Azure PostgreSQL or Cosmos DB for metadata and audit logs, with Azure Blob Storage handling source documents. Azure Key Vault manages secrets, and Application Insights provides comprehensive logs and metrics.
Critically, switching from local to Azure requires only environment variable changes, with no code modifications. The SQLAlchemy layer abstracts database differences, eliminating schema migrations between SQLite and PostgreSQL. The system is designed to "fail fast" if required AZURE_* settings are missing, preventing silent degradation.
WHAT WE'D CHANGE
The described architecture provides a robust foundation, yet certain aspects warrant consideration for broader enterprise adoption or future scaling beyond an initial proof-of-concept. The reliance on SQLite for metadata and chunk storage, while entirely suitable for local development and initial stages, presents a scaling bottleneck for high-volume production environments. While the post explicitly mentions Azure PostgreSQL or Cosmos DB for production, the initial local setup's simplicity might lead some teams to underestimate the migration complexity if not planned and tested early. The SQLAlchemy layer mitigates schema changes, but performance tuning and operational management of a distributed database introduce new challenges.
The detailed API key management system, including local user registration and revocation, duplicates functionality often handled by existing enterprise identity providers (IdPs) such as Okta, Auth0, or Azure AD. Integrating directly with an enterprise's established IdP would reduce operational overhead, enhance security posture by centralizing identity management, and streamline user provisioning. This integration would shift the burden of user lifecycle management and authentication from the RAG system to the organization's existing security infrastructure.
Furthermore, while the evaluation metrics are robust, particularly the focus on "restricted leakage count," the post does not detail how the evaluation set itself is generated or maintained. For true production readiness, especially with a dynamic document corpus, the process of creating and updating this evaluation set is critical and often resource-intensive. Without a defined strategy for evaluation set management, the risk of evaluation drift—where the test set no longer accurately reflects real-world usage or new data—increases. This aspect requires a dedicated pipeline and governance to ensure continuous relevance and accuracy of performance metrics.
This RAG system blueprint offers a detailed, production-oriented approach, emphasizing access control and measurable performance. Its modular design, with clear separation between local development and cloud deployment, provides a structured path for teams building internal AI applications. The focus on traceable citations and leakage prevention addresses critical enterprise requirements for trust and data security.
Pull quote: “Role-based candidate filtering is applied before retrieval scoring, a critical security measure.”
Every claim ties to a primary source. See our methodology.