HomeReadTactics deskOrchestrating 26 AI Agents to Audit Code
Tactics·May 30, 2026

Orchestrating 26 AI Agents to Audit Code

ohugonnot used a 4-stage pipeline with 26 AI agents to audit his ShareBox project, uncovering a PHP injection vulnerability. This tactic offers a structured alternative to manual code review.…

ohugonnot used a 4-stage pipeline with 26 AI agents to audit his ShareBox project, uncovering a PHP injection vulnerability. This tactic offers a structured alternative to manual code review.

ohugonnot deployed 26 AI agents to audit his open-source project, ShareBox, uncovering a PHP injection vulnerability and assigning a 5.04/10 score. This orchestration of specialized AI agents provided a structured alternative to manual code review, which the founder noted often "skims over what you think you already know." The process highlighted a critical flaw in 22,000 lines of code, demonstrating a specific application of AI for security and quality assurance.

Orchestrating 26 Agents for Code Audit

ohugonnot developed ShareBox, a self-hosted PHP streaming server, which gained unexpected GitHub stars. This public exposure prompted a security audit for the 22,000-line codebase. Recognizing the limitations of solo manual review, ohugonnot opted for an orchestrated AI approach, deploying 26 agents with distinct roles. The objective was not a generic "AI, tell me if my code is good" prompt, which typically yields unhelpful platitudes, but a structured, multi-stage pipeline designed for adversarial scrutiny.

Parallel Reading and Synthesis

The audit began with eleven "reader" agents operating in parallel. Each agent was assigned a specific "slice" of the ShareBox codebase, such as the core logic, streaming handlers, API, frontend, tests, or Docker setup. This parallel processing allowed for comprehensive initial ingestion of the entire project. Following the individual reports from these readers, a synthesis stage connected the disparate pieces. This phase generated an architectural overview and a test-coverage analysis, providing a holistic understanding of the project's structure and any gaps in its testing.

Scoring with Radar Agents

After the initial reading and synthesis, twelve "radar" agents took over. Each radar agent was tasked with scoring the project on a single, specific axis. These axes included security, performance, and overall architecture. This specialized scoring ensured that each critical dimension of the project was evaluated independently, preventing a single agent from attempting to assess too broad a scope. The output from this stage was a set of granular scores across various quality and security metrics.

Adversarial Verdict Uncovers Vulnerability

The final stage involved a "verdict" agent, designed to operate in an explicitly adversarial mode. This agent re-read every score and conclusion generated by the radar agents, with a directive to "knock down the ones that are too kind." The verdict agent was instructed to score like a "demanding staff engineer," calibrating its assessment against realistic expectations for a "few-weeks-old PHP media server," rather than a mature product like Jellyfin. This rigorous approach resulted in a final score of 5.04 out of 10. ohugonnot noted that "a low, well-argued score is worth a thousand complacent 'great project!'s." The process justified itself by identifying a critical PHP injection vulnerability. The security agent flagged the entrypoint.sh script, which generates config.php from environment variables. While a comment in the code explicitly stated, "Sanitize strings to prevent PHP injection," three numeric/boolean variables were interpolated raw. This oversight meant an environment variable like SHAREBOX_STREAM_MAX_CONCURRENT='1);system($_GET[x]);//' could inject executable PHP into the configuration file, leading to arbitrary code execution. [Source: ohugonnot, "I Audited My Own Open-Source Project With 26 AI Agents (and Found a Real Vulnerability)," dev.to, 2026-05-29]

WHAT WE'D CHANGE

The AI agent orchestration tactic, while effective for ohugonnot's ShareBox project, presents specific considerations for broader application. The "26 agents" count was tailored to a 22,000-line PHP codebase. Founders with significantly larger or smaller projects would need to adjust agent numbers and specialized roles. A large enterprise application might require hundreds of agents, while a microservice could need fewer than ten, each with more focused responsibilities. The cost implications of running numerous AI agents, particularly for extensive codebases and frequent audits, were not detailed. API call costs for models like GPT-4, when scaled to 26 agents processing 22,000 lines, could become substantial. This necessitates a cost-benefit analysis before implementation.

The success of the "adversarial verdict" agent hinges on precise prompt engineering. The instruction to "score like a demanding staff engineer" and to "knock down the ones that are too kind" is critical for avoiding generic, positive feedback. Replicating this requires careful definition of what "demanding" means in a specific technical context, which can be subjective and difficult to standardize across different projects or teams. Furthermore, this approach is primarily an auditing tool for existing code. While it identifies vulnerabilities, it does not inherently integrate into a continuous integration/continuous delivery (CI/CD) pipeline for preventing such issues during active development. Integrating AI agents into pull request reviews or pre-commit hooks would require a different architectural approach, focusing on smaller, incremental code changes rather than a full codebase audit. The discovery of a vulnerability "right under my eyes" with an explicit comment highlights that AI excels at exhaustive, pattern-based checks, but human review remains essential for contextual understanding and addressing known blind spots.

LANDING

The orchestration of AI agents for code auditing represents a shift from purely human-driven review to a more structured, automated process. ohugonnot's experience with ShareBox demonstrates that a deliberately adversarial AI pipeline can surface critical flaws that human eyes overlook. This approach provides a concrete method for founders to enhance code quality and security, moving beyond superficial checks to a deep, multi-faceted analysis. The value lies not in a high score, but in a "low, well-argued score" that provides an actionable roadmap for improvement.

Pull quote: “A low, well-argued score is worth a thousand complacent 'great project!'s.”

Sources · how we verified
  1. I Audited My Own Open-Source Project With 26 AI Agents (and Found a Real Vulnerability)

Every claim ties to a primary source. See our methodology.

Reported by the Maya desk on Founderr Pulse’s Tactics beat. Every factual claim is tied to a primary source and linked; anything that can’t be stood up doesn’t run. Founderr (RIKHATH LLC) is the accountable publisher and corrects in place. How we work · About · File a correction.
M
Maya

The Maya desk covers tactics: concrete playbooks, growth experiments, and operating decisions indie founders are running now. Every claim is sourced and linked. Operated by Founderr (RIKHATH LLC) See the desk →

Founderr Pulse — free & independent. The desk for people who build & back.