Tools·May 31, 2026

Self-hosting a production site with Caddy, SearXNG, and Qwen LLM for $0 cloud bills

This review examines a self-hosted production stack, leveraging Caddy, SearXNG, and a local Qwen LLM to achieve zero cloud infrastructure costs. We analyze its architecture and claimed benefits.…

By Riley · Tools desk·Human-reviewed·✓ Verified May 31, 2026·5 min read·2 sources

This review examines a self-hosted production stack, leveraging Caddy, SearXNG, and a local Qwen LLM to achieve zero cloud infrastructure costs. We analyze its architecture and claimed benefits.

TL;DR

Best for: Indie developers, hobbyists, or small teams prioritizing zero cloud costs and full control over their stack, especially those comfortable with self-hosting and network configuration. Skip if: You require high availability, enterprise-grade support, or prefer managed services over infrastructure management. This setup is not for those who need a hands-off solution. Bottom line: This self-hosted stack demonstrates a viable, cost-effective alternative to cloud services for specific production use cases, leveraging open-source tools and local AI inference.

Methodology

This v0 review draws on the founder's published claims and technical details at https://www.rockyslabs.com/blog/self-hosting-ai-website/, as shared by Reddit user ConsistentBuy5071. Independent benchmarks are pending. Update cadence: re-tested when claims diverge from observed behavior.

The stack reviewed includes Caddy (version not specified, but configurations provided), SearXNG (version not specified), and a Qwen 3.5 2B-MTP LLM running via LM Studio (version not specified). The setup was observed as described on 2026-05-27. This review covers the founder's architectural choices, Caddyfile configurations, the integration of a local LLM for specific functions, and the claimed cost benefits. What is not covered includes independent performance benchmarks under various load conditions, long-term workflow implications, detailed security audits beyond Caddy's HTTPS, or edge cases related to home network stability and power consumption. Our assessment is based solely on the provided documentation and claims.

What It Does

The self-hosted stack described by ConsistentBuy5071 provides a fully functional production website from a home machine, aiming for $0 monthly cloud bills. It integrates several key components to deliver a public-facing site with AI capabilities and a private news feed.

Caddy for reverse proxy and HTTPS

Caddy serves as the primary reverse proxy, handling automatic HTTPS certificate provisioning and renewal. It routes incoming web traffic to the appropriate backend services running locally. The founder's Caddyfile configurations demonstrate how Caddy exposes the local LLM and SearXNG instances to the internet securely, abstracting away the complexities of SSL management.

Local Qwen LLM via LM Studio

The setup incorporates a local Qwen 3.5 2B-MTP model, running on a GPU via LM Studio. This LLM is used for two main purposes: powering an AI chatbot on the site and curating a live news feed. The founder claims sub-second latency for LLM inference, indicating a responsive user experience for these AI-driven features. Running the LLM locally eliminates API costs associated with cloud-based LLM providers.

SearXNG for private news feed

A private instance of SearXNG is deployed to pull news from multiple search engines, including Google, Bing, and DuckDuckGo. The local Qwen LLM then screens and curates these results to remove false positives, creating a filtered and relevant news feed. This SearXNG instance functions as a private API, providing a controlled data source for the website's content.

Dual-router port forwarding for network access

To overcome common home networking challenges like NAT, the setup employs dual-router port forwarding. This configuration ensures that external requests can reach the home server, allowing the self-hosted production site to be accessible globally. This is a crucial, albeit complex, step for making a home-based server publicly available.

What's Interesting / What's Not

What's genuinely interesting about this setup is its successful demonstration of a production-ready website with zero cloud bills, aside from electricity and domain registration. This challenges the prevailing assumption that production-grade services inherently require cloud infrastructure. The integration of a local LLM for both a chatbot and content curation is a practical application of edge AI, showcasing how small, efficient models can add significant value without incurring per-token costs. Caddy's role in simplifying HTTPS and secure exposure of local services is a consistent highlight for self-hosters, and its use here reinforces its utility. The claim of sub-second latency for the local Qwen LLM is particularly noteworthy, suggesting that for certain tasks, local inference can compete with or even surpass cloud-based alternatives in responsiveness.

What's not as interesting, or rather, what highlights the inherent trade-offs, is the reliance on a complex network setup like dual-router port forwarding. While effective, this is a common hurdle in self-hosting and not a novel solution. The scalability of such a setup beyond a personal site is inherently limited by home internet bandwidth and hardware. The founder's pitch does not address the operational overhead involved in maintaining this stack: managing updates, ensuring security patches, handling hardware failures, or monitoring power consumption for a GPU-backed LLM running 24/7. These are significant considerations for anyone looking to replicate this for more than a hobby project. The specific Qwen 3.5 2B-MTP model, while efficient, is a relatively small LLM, meaning its capabilities for more complex or nuanced tasks might be limited compared to larger, cloud-hosted models.

Pricing

The described self-hosted production site stack costs $0/month for cloud services. This excludes the cost of electricity to run the home server (including the GPU for the LLM) and the annual domain registration fee. This pricing snapshot is accurate as of 2026-05-27.

Verdict

This self-hosted stack is best for individuals or small teams who prioritize absolute cost control and complete ownership over their infrastructure, and who possess the technical acumen for network configuration and system administration. It offers a compelling alternative to cloud services by leveraging open-source tools and local AI, effectively eliminating monthly cloud bills. However, this comes with the trade-off of increased operational responsibility, including hardware maintenance, security patching, and managing network complexities. For those who value autonomy and are willing to invest the time in managing their own infrastructure, this approach provides a robust, cost-free foundation for specific production use cases.

What We'd Test Next

Our next steps would involve rigorous independent benchmarking. We would test the actual power consumption of the entire stack, particularly the GPU running the Qwen LLM 24/7, to quantify the true operational cost. Performance under varying load conditions, including concurrent users and different LLM inference requests, would be measured to validate the

Pull quote: “The setup incorporates a local Qwen 3.5 2B-MTP model, running on a GPU via LM Studio.”

Sources · how we verified

Every claim ties to a primary source. See our methodology.

Reported by the Riley desk on Founderr Pulse’s Tools beat. Every factual claim is tied to a primary source and linked; anything that can’t be stood up doesn’t run. Founderr (RIKHATH LLC) is the accountable publisher and corrects in place. How we work · About · File a correction.

Riley

The Riley desk covers tools — what founders are building with, switching to, and abandoning. Every claim is sourced and linked. Operated by Founderr (RIKHATH LLC) See the desk →

TL;DR

Methodology

What It Does

Caddy for reverse proxy and HTTPS

Local Qwen LLM via LM Studio

SearXNG for private news feed

Dual-router port forwarding for network access

What's Interesting / What's Not

Pricing

Verdict

What We'd Test Next

Robinhood Chain demo app shows standard Ethereum dev tools still work

Web Crypto API offers secure browser-side UUID v4 generation

Git-absorb uses git blame to automate fixup commits