How a founder used Telegram to let contractors restart servers without SSH access
A WordPress agency owner built an approval-driven system using Python, Go, and a Telegram bot. The playbook details a specific command grammar and five-part validation process to eliminate credential…
A WordPress agency owner built an approval-driven system using Python, Go, and a Telegram bot. The playbook details a specific command grammar and five-part validation process to eliminate credential drift.
Handing a contractor root SSH access for a single task often creates security debt. A founder running a WordPress and Cloudflare agency discovered four ex-contractors still had server access long after their projects ended. This, combined with 3 a.m. "site is down" alerts that required finding Wi-Fi to restart a server, prompted the creation of a custom operations system. The goal was a system where a contractor could restart nginx on a specific server, and nothing else. The solution gives contractors no shell access at all, relying instead on a chat command that performs one specific, pre-approved action.
This playbook is based on the founder's detailed breakdown of the architecture. It offers a blueprint for founders managing remote servers and temporary staff, replacing high-trust access with a low-friction, auditable workflow.
A three-part architecture
The system has three main components. First is the operator interface, a Telegram bot built with Python's aiogram library. This is the only way a human interacts with the system. Second is the Python control plane, which receives commands from Telegram. It runs a policy engine, manages state in an SQLite database, and maintains a hash-chained audit log for every action. The control plane communicates with third-party services like the Cloudflare API.
Third is a Go agent that resides on each managed server. This agent communicates with the control plane over a mutually authenticated TLS (mTLS) connection. It accepts a fixed list of operations defined in a capability manifest loaded at startup. The agent never exposes a shell; it only executes commands explicitly sent by the validated control plane.
The APPROVE grammar
Every action that changes server or service state must pass through a strict command grammar. There is no alternative path for execution. The basic structure for a command sent in Telegram is:
APPROVE: <action> <client_id> nonce=<16-hex> ts=<unix_epoch>
An optional Time-based One-Time Password (TOTP) can be required by setting an environment variable, which modifies the grammar to include an otp field. The <action> must be one of a hardcoded list of identifiers, such as restart-web or cf-under-attack. The client_id specifies the target system. The nonce and timestamp are critical for preventing replay attacks and stale requests.
Five layers of validation
The Python control plane enforces five rules before an action is dispatched to a Go agent. If any check fails, the request is rejected and logged.
- Action Allowlist: The requested action must exist in a canonical list (
ActionIDenum). Adding new actions requires a code change, a deliberate friction to prevent casual expansion of capabilities. - Time-to-Live (TTL): The timestamp
tsin the command must be within 300 seconds of the current time. This prevents the execution of old, potentially irrelevant approval messages. - One-Time Nonce: The 16-character hexadecimal nonce is checked against a database of used nonces. Any attempt to reuse a nonce is rejected, blocking simple replay attacks.
- TOTP Second Factor: If configured, the six-digit one-time password is validated. A mismatch results in an immediate rejection.
- Sender Allowlist: The control plane checks the requesting user's Telegram ID against a list of approved user IDs stored in an environment variable. Messages from unknown users are ignored.
What We'd Change
The described system is a robust solution for a solo operator or a very small team. Its primary limitation is the single-approver bottleneck. For a larger team, this model would not scale, especially across different time zones. A modern implementation would incorporate role-based access control (RBAC) or multi-user approval workflows directly into the policy engine.
The founder's choice to hardcode available actions into a code-based enum is a strong security posture. It makes adding new capabilities slow and deliberate. An alternative approach could use a dedicated policy engine like Open Policy Agent (OPA). This would externalize the rules, allowing for more agile changes without full code deployments, though it introduces new complexity.
Finally, building and maintaining a custom Go agent for every server is a significant operational overhead. A different strategy could use the same Python control plane to trigger commands via a more standard configuration management tool like Ansible. This would leverage a widely-used, hardened tool for execution, reducing the custom code footprint.
Landing
This playbook's value is not in its specific technology stack. Its core lesson is in designing operational workflows that make the secure path the easiest one. By replacing direct server access with a strictly audited, approval-based API delivered through chat, the founder eliminated an entire class of security risks associated with credential drift. The system enforces discipline by default, solving the
The investor read
This system is a classic bootstrapped solution to a problem typically solved by expensive enterprise Privileged Access Management (PAM) software. It signals a persistent market gap for developer-centric, API-driven access control tools that are simpler than products like Teleport but more robust than manually sharing credentials. The described architecture is a feature, not a company. To be investable, it would need to be productized into a multi-tenant SaaS with a GUI for policy management, extensive integrations (Slack, Teams, AWS, GCP), and a clear go-to-market strategy. As is, it's a lifestyle business enabler, deliberately designed to solve a single founder's operational pain, not for scalable growth.
Pull quote: “The goal was a system where a contractor could restart nginx on a specific server, and nothing else.”
Every claim ties to a primary source. See our methodology.