Dylan Worrall's five rules for AI browser automation that doesn't break
Most AI browser automation is brittle. Froots founder Dylan Worrall shares five specific engineering patterns for building agents that work reliably, even on sites you don't control. Most agent demos…
Most AI browser automation is brittle. Froots founder Dylan Worrall shares five specific engineering patterns for building agents that work reliably, even on sites you don't control.
Most agent demos that involve a browser are shot in one take for a reason. Dylan Worrall, building the browser layer for his product Froots, argues that the engineering patterns that separate a demo from a reliable product are few, but specific. Moving from "works in the video" to "works at 3am" requires a shift from clever hacks to a discipline of verification.
Prefer structured verbs over raw eval
Giving an AI agent a raw eval command to run arbitrary JavaScript is a common starting point. Worrall reports this approach is opaque when it fails. He advocates for a small, structured vocabulary of commands: navigate, click, fill, type, text, and wait_selector. This approach makes the agent's intent legible and produces predictable error messages like "selector not found" instead of a stack trace from a minified JavaScript bundle. The eval function becomes a fallback, not the default.
Replace sleep with conditional waits
The single biggest source of flakiness in browser automation, according to Worrall, is the fixed sleep(2000) command. Waiting an arbitrary amount of time is a guess. Too short, and the agent acts on an element that does not exist yet. Too long, and every execution wastes seconds. The reliable alternative is to wait on conditions. The agent should poll until a specific element exists, a loading spinner disappears, or a navigation event completes. This makes the agent both faster and more robust.
Confirm every write with a read
A command can return a success status while having done nothing. Worrall describes learning this lesson the hard way, with an agent sending commands to a UI pane that was no longer present. The fix is a discipline: a write should be confirmed by a read. After filling a form field, the agent should read the value back from the field. After clicking a submit button, it should wait for a URL change or a specific success message to appear on the page. Silent success is not the same as success.
Use the active session for authenticated reads
To access data behind a login, re-authenticating or storing user credentials introduces complexity and security risks. Worrall's method is to use the browser's existing session. An in-page fetch command with the credentials: 'include' option reuses the session's cookies. This allows the agent to access authenticated data just as the logged-in user would, without managing passwords. The agent can first probe for a login cookie to confirm a session exists before attempting the fetch.
Default to screenshots when the DOM is hostile
Modern web UIs can be difficult to parse. Shadow DOMs, canvas-based interfaces, and obfuscated CSS class names can make traditional element selectors unreliable. When the Document Object Model is hostile, Worrall's fallback is to stop fighting it. Instead, the agent takes a screenshot of the page. A vision model can then read the image to find the necessary information or locate the next element to interact with. This is often more robust than a brittle selector.
What we'd change
Worrall's playbook is a strong foundation for building agents that operate on the third-party web. The principles are less critical for internal tools where developers control the DOM and can provide stable, test-ID-style selectors. The advice to use vision models as a fallback is sound, but omits the cost implications. Vision model API calls are orders of magnitude more expensive than DOM parsing. A system that frequently defaults to screenshots could become prohibitively expensive at scale, a trade-off teams must manage. Finally, the playbook is grounded in the current state of tools like Playwright or Puppeteer. Emerging agent-native browser automation frameworks aim to solve some of these reliability challenges at a lower level of abstraction. While the core principles of conditional waits and read-after-write will likely hold, the specific implementation may look different in two years.
Landing
The five tactics are implementations of a single meta-lesson Worrall identifies: closing the loop. Reliable browser automation is not about finding the perfect selector. It is about building a system that acts, observes the result, confirms the desired state change, and never trusts an action it did not independently verify. This is a shift from treating automation as a script to treating it as a state-management problem, an engineering discipline required to move AI agents from demos to production.
The investor read
The AI agent market is moving from demo-ware to infrastructure. Worrall's playbook details the unglamorous engineering required for reliability, which is the primary moat in this category. A flashy demo is table stakes; an agent that runs unattended for weeks is a business. This signals that the most durable companies will have deep technical teams focused on infrastructure-level problems, not just prompt engineering. For investors, the key diligence question for any agentic AI company is how they handle the brittleness of browser automation. A team that hand-waves this problem is not investable. A team that details their 'act, observe, confirm' loops, like Worrall does, is building a real asset.
Pull quote: “Silent success is not the same as success.”
Every claim ties to a primary source. See our methodology.