Sentinel Project Explores Safe LLM-Driven SRE Automation with Policy Engine
Blazi2002's Sentinel prototype demonstrates an autonomous SRE system, using local LLMs for incident diagnosis and a deterministic policy engine for secure command execution, all within a private…
Blazi2002's Sentinel prototype demonstrates an autonomous SRE system, using local LLMs for incident diagnosis and a deterministic policy engine for secure command execution, all within a private network.
The core problem Sentinel addresses is the gap between passive observability and active remediation in production environments. While tools like Prometheus and Grafana alert on issues, human intervention remains necessary for diagnosis and resolution. LLM assistants could bridge this, but their reliance on public cloud APIs for sensitive data makes them unsuitable for regulated sectors.
The Answer Up Front
Sentinel is a compelling architectural blueprint for founders building AI-driven automation in highly regulated or security-conscious environments. Its deliberate separation of probabilistic LLM reasoning from deterministic policy enforcement is a critical design choice for safety and compliance. Founders in finance, healthcare, defense, or utilities should pay close attention to this pattern. Those seeking an off-the-shelf, fully autonomous solution or who operate exclusively in public cloud environments without strict data egress concerns will find Sentinel is not yet a product, but a foundational concept.
Methodology
This v0 review draws on the founder's published claims at dev.to, specifically Blazi2002's blog post titled "I built an autonomous SRE that lets an LLM diagnose incidents — but never touch a shell unsupervised," published on 2026-06-10. The review also references the associated GitHub repository, https://github.com/Blazi2002/sentinel, as a public artifact. We cover the described technical architecture, the rationale behind design decisions, and the proposed workflow. What is not covered includes independent performance benchmarks, long-term operational stability, real-world incident response effectiveness beyond the described prototype, or edge case handling. Update cadence: re-tested when claims diverge from observed behavior or when a more mature product emerges.
What It Does
Sentinel is a prototype autonomous SRE system designed to diagnose and propose fixes for production incidents using local LLMs, with a strong emphasis on security and data privacy. The entire system operates within a customer's network, ensuring zero data egress.
Separating Probabilistic from Deterministic
The central tenet of Sentinel's design is the hard separation between the probabilistic nature of an LLM and the deterministic requirements of command execution. The LLM's role is strictly to diagnose anomalies and draft remediation plans. A separate, deterministic policy engine then evaluates every proposed command against a fixed, verifiable rule set before any action can be taken. This prevents the LLM from executing potentially dangerous or unintended commands unsupervised.
Eight-Stage Incident Pipeline
The system follows an eight-stage pipeline: detect → capture → transport → reason → validate → persist → approve → execute. A Go agent on each host detects metric thresholds (e.g., memory at 85%) and captures telemetry. This event is transported via gRPC to a central hub. The hub prompts a locally-hosted LLM to reason about the incident and return a structured JSON plan, including root cause, risk level, confidence, and ordered commands. The policy engine then validates each command, categorizing it as allow, review, or block. The incident is persisted to PostgreSQL, an operator reviews and approves or rejects the plan on a dashboard, and finally, the node executes only the allow commands (dry-run by default), reporting back the results.
Advanced Policy Engine with AST Parsing
The policy engine is a key differentiator. Instead of relying on naive string matching for command validation, it parses each command into an Abstract Syntax Tree (AST) using a real shell grammar parser (mvdan/sh). This allows it to identify and prevent dangerous commands even when they are obscured within eval or sh -c constructs, && / || chains, command substitutions ($(...)), or redirection targets (e.g., writing to system files). This deep parsing ensures a robust and difficult-to-bypass validation layer.
What's Interesting / What's Not
The most interesting aspect of Sentinel is its explicit and robust architectural pattern for safe AI automation. The "probabilistic vs. deterministic" separation is not merely a philosophical stance but a concrete engineering decision, implemented with an AST-parsing policy engine. This approach directly addresses the primary concern with LLM-driven automation: unpredictable and potentially destructive outputs. For regulated industries, the zero data egress commitment, achieved through local LLM inference and an on-premise architecture, is a non-negotiable requirement that few current AI tools satisfy.
What's less interesting, or rather, what highlights its prototype status, is the manual approve stage. While necessary for safety in a prototype, it limits the system's full autonomy. The promise of an autonomous SRE is to reduce human toil, especially for routine or well-understood incidents. For Sentinel to evolve into a product, this approval step would need sophisticated automation, potentially incorporating more advanced policy-as-code or human-in-the-loop mechanisms that allow for conditional auto-approval based on risk profiles and confidence scores. The current reliance on a locally-hosted LLM, while critical for data privacy, also implies potential performance and model update challenges compared to cloud-based alternatives.
Pricing
Sentinel is presented as a prototype and learning project, with its source code publicly available on GitHub. No pricing information is available, as it is not a commercial product.
Verdict
Sentinel offers a technically sound blueprint for secure, LLM-driven SRE automation, particularly for organizations with stringent data privacy and regulatory requirements. The core innovation lies in its deterministic policy engine, which uses AST parsing to vet LLM-generated commands, effectively creating a safety moat. While currently a prototype requiring manual approval, its architecture demonstrates a viable path for deploying AI in sensitive production environments without compromising security or compliance. This project is a strong signal for the future of on-premise, safety-first AI operations.
What We'd Test Next
Our next steps would involve benchmarking the local LLM inference performance and latency across various open-source models (e.g., Llama 3, Mixtral) on commodity hardware, as well as the end-to-end latency of the detect to execute pipeline. We would also rigorously test the policy engine's robustness against a suite of adversarial prompts designed to bypass its AST parsing, evaluating its false positive and false negative rates. Further investigation would include the ease of defining and managing complex policies, and how Sentinel integrates with existing observability stacks beyond basic metric thresholds. Finally, we would explore mechanisms for automating the approve stage safely, perhaps through tiered policies or integration with incident management workflows.
The investor read
Sentinel's architecture signals a critical emerging trend: the demand for secure, on-premise AI automation, particularly within highly regulated sectors. The explicit separation of probabilistic LLM reasoning from deterministic policy enforcement, validated via AST parsing, establishes a robust pattern for safety-critical AI deployments. This approach directly addresses enterprise concerns around data privacy and compliance, a significant hurdle for many cloud-native AI solutions. Investment opportunities lie in companies that can productize this 'safety moat' concept, offering a hardened, enterprise-grade version with broader LLM support, advanced policy-as-code capabilities, and seamless integration into existing AIOps and incident management platforms. While Sentinel itself is a prototype, its underlying design principles are highly investable, pointing towards a future where AI-driven operations can scale securely within regulated environments, differentiating from less secure, fully autonomous, cloud-dependent alternatives.
- I built an autonomous SRE that lets an LLM diagnose incidents — but never touch a shell unsupervised ↗
- Blazi2002/sentinel ↗
Every claim ties to a primary source. See our methodology.