How to refactor a Python AI agent from a single file to a reusable package
A developer known as 'wonderlab' documents the process of turning a 900-line demo script into a modular, production-ready package. The playbook focuses on API design and explicit permission controls.…
A developer known as 'wonderlab' documents the process of turning a 900-line demo script into a modular, production-ready package. The playbook focuses on API design and explicit permission controls.
A 900-line Python script demonstrated eight defense layers for an AI agent. The author, writing on Dev.to as 'wonderlab,' reports the file was effective for a demo but unusable for a real product. The code was coupled, untestable, and could not be imported by other projects. This is the common engineering gap between a proof-of-concept and a production system.
The public playbook details the refactoring process, offering a specific structure for building more robust and secure AI agents. The approach moves logic from a monolithic script into a reusable Python package with a clear, security-conscious API.
From a monolith to modules
The first step was to decompose the single file into a package with distinct modules, each responsible for a specific defense layer. The author provides a clear directory structure for the harness package, separating concerns like action registration, budget management, input sandboxing, and audit logging.
registry.py: Manages what actions an agent can perform and at what permission level.budget.py: Controls resource consumption for agent actions.sandbox.py: Sanitizes inputs and evaluates code in a restricted environment.audit.py: Creates a hash-chained, immutable log of all agent activities.rollback.py: Coordinates state restoration after a failed or malicious action.
This modularity allows individual components to be tested in isolation and reused across different agent implementations. The unified harness.py serves as the public entry point, composing these layers into a coherent security wrapper.
Design the API for security
The author emphasizes making deliberate API design choices that enhance security. A key example is the ActionRegistry class. Instead of using Python's standard __getitem__ for retrieving a registered action, the class implements a get() method.
This is not a stylistic choice. A failed lookup via __getitem__ would raise a generic KeyError, potentially leaking internal state details. The custom get() method, however, is designed to raise a specific PermissionError. This provides a consistent, informative error type for any unauthorized action attempt, preventing ambiguity and improving the security posture of the interface.
Codify agent permissions
The core of the security model is the ActionRegistry, which forces developers to explicitly define every possible agent action. Each action is registered as a RegisteredAction dataclass, containing its name, a PermissionLevel (e.g., READ, WRITE, IRREVERSIBLE), a budget cost, and the handler function.
A production-grade system requires explicit, auditable controls over an agent's capabilities. This registry serves as a central manifest. It determines what the agent can do, and the is_allowed() method provides a clear checkpoint before any action is executed. This structure makes the agent's potential behavior legible and easier to audit.
What We'd Change
The playbook is a strong technical guide for code structure but lacks the surrounding product and security context. The author mentions eight defense layers, but the article provides no threat model. Founders implementing this playbook would need to first define what specific risks they are mitigating. Without a threat model, these layers are solutions in search of a problem.
The author also notes two integration styles, including with the popular LangGraph framework, but provides no implementation details. This is a significant omission. The value of a reusable package is determined by how easily it integrates with the tools developers actually use. A detailed guide on connecting this harness to a LangGraph agent would be a necessary next step for this playbook to be truly actionable.
Finally, the post advocates for testability but offers no testing strategy. For a security-focused package, this is a critical gap. A production-ready version would require a comprehensive test suite, including adversarial tests designed to bypass the implemented defense layers. The playbook shows how to build the walls but not how to stress-test them.
Landing
The process of refactoring a demo script into a modular package is more than a code cleanup exercise. It represents a fundamental shift from building a prototype that works to engineering a product that can be trusted. By creating explicit modules for security controls and designing APIs that enforce permissions, developers build a foundation for testable, auditable, and scalable AI agents. This architectural discipline is what separates a fragile demo from a durable product.
The investor read
This playbook signals the maturation of the AI agent market, moving from prompt engineering hacks to disciplined software engineering. Early-stage agent companies often consist of a clever demo script; this level of architectural rigor is a leading indicator of a team building a durable, defensible asset. An investable company in this space will not only show this level of code quality but will also be able to articulate the specific threat model it defends against and provide evidence of rigorous adversarial testing. The code is a necessary, but not sufficient, condition for building a trusted agent platform. This is a shift from valuing the agent's capabilities to valuing the robustness of its guardrails.
Pull quote: “A production-grade system requires explicit, auditable controls over an agent's capabilities.”
Every claim ties to a primary source. See our methodology.