Tools·Jun 9, 2026

21 Consumer GPUs Benchmarked for Local TTS: Cost-Performance on Vast.ai

A user-submitted benchmark evaluates 21 consumer and server GPUs on vast.ai for OmniVoice TTS, offering insights into cost-performance ratios for small local AI models with 5GB VRAM requirements. The…

By Riley · Tools desk·Human-reviewed·✓ Verified Jun 9, 2026·6 min read·1 source

A user-submitted benchmark evaluates 21 consumer and server GPUs on vast.ai for OmniVoice TTS, offering insights into cost-performance ratios for small local AI models with 5GB VRAM requirements.

The Answer Up Front

For indie founders building local AI applications with modest VRAM requirements (around 5GB), this benchmark highlights several compelling options. If raw inference speed is paramount, the NVIDIA RTX 4090 consistently leads. However, for a balanced approach to cost and performance on rental platforms like vast.ai, the RTX 3090 and even older server-grade GPUs like the A6000 present strong value. Skip these specific GPU choices if your model demands significantly more VRAM or if you require enterprise-grade support and guaranteed performance metrics beyond community benchmarks. The bottom line: for small, local TTS, consumer GPUs offer accessible and performant options, with a clear hierarchy emerging from the data.

Methodology

This v0 review draws on the founder urarthur's published claims and an accompanying benchmark image on Reddit, accessed on May 19, 2026. Independent benchmarks are pending, and our update cadence will re-test when claims diverge from observed behavior. The benchmark, conducted by urarthur, involved renting 21 different GPUs on vast.ai for brief periods to test a small Text-to-Speech (TTS) model, OmniVoice. The model had a peak VRAM usage of approximately 5 GB. Performance was measured in "xRT" (times real-time), indicating how much faster than real-time the GPU generates audio. Each GPU ran three iterations of a small paragraph with reference audio for voice cloning, and the average xRT was recorded. This review covers the founder's reported performance numbers and the relative comparisons derived from the benchmark image. It does not cover independent performance verification, long-term workflow integration, power consumption, or edge cases beyond the specific OmniVoice model and paragraph tested.

What It Does

Benchmarking OmniVoice TTS

The core of urarthur's post is a direct comparison of 21 GPUs running OmniVoice, a small Text-to-Speech model. The model's modest 5GB VRAM requirement makes it suitable for a wide range of consumer-grade hardware, which is often more accessible and cost-effective than high-end data center GPUs. The benchmark specifically focuses on voice cloning, using a small paragraph and reference audio to generate speech.

xRT as a Performance Metric

The chosen performance metric, "xRT" (times real-time), quantifies how many times faster than real-time a GPU can generate audio. An xRT of 10, for example, means the GPU generates 10 seconds of audio in 1 second. This metric is intuitive for audio generation tasks, directly reflecting the user experience for real-time or near real-time applications. The benchmark provides a clear, single number for each GPU, allowing for straightforward comparison.

Vast.ai Rental Context

All GPUs were rented on vast.ai, a decentralized GPU rental marketplace. This context is crucial because it frames the benchmark within a real-world scenario for developers seeking flexible, on-demand compute. The implicit goal is to identify cost-effective performance, even if specific rental prices are dynamic and not explicitly detailed in the benchmark results. The variety of GPUs tested, from consumer cards like the RTX series to server-grade options like the A6000 and V100, reflects the diverse offerings on such platforms.

What's Interesting / What's Not

What is most interesting here is the practical, real-world utility of a community-driven benchmark. While urarthur acknowledges it is not an extensive or scientific analysis, it provides actionable insights for developers operating within tight budgets or exploring local inference. The data clearly shows the NVIDIA RTX 4090 as the performance leader, achieving approximately 100xRT. However, the benchmark also highlights strong contenders like the RTX 3090 (around 60xRT) and even the A6000 (around 50xRT), which, depending on vast.ai's dynamic pricing, could offer superior cost-performance for this specific workload. The 5GB VRAM ceiling is critical; it means many older or mid-range GPUs remain highly viable for small models, democratizing access to local AI inference.

What is less interesting, or rather, what is missing, is a more rigorous, controlled testing environment. The benchmark's informal nature means variables like host CPU, memory, and network latency on vast.ai could influence results. There is no data on power consumption, which is a significant factor for long-term deployments or local hardware purchases. The lack of specific vast.ai rental costs alongside the xRT numbers makes a precise cost-performance ratio calculation impossible from this source alone, requiring users to cross-reference with live vast.ai pricing. While the relative performance is clear, the absolute economic efficiency remains an inference rather than a direct measurement.

Pricing

This benchmark does not present direct pricing for a tool, but rather the performance of GPUs rented on vast.ai. Vast.ai operates on a dynamic, market-driven pricing model, meaning rental costs for each GPU fluctuate based on supply and demand. Therefore, specific cost-performance ratios cannot be fixed. Developers interested in leveraging these GPUs would need to consult vast.ai's live pricing at the time of rental. (Pricing snapshot date: May 19, 2026, for context of benchmark execution).

Verdict

For indie developers and small teams focused on deploying compact AI models like OmniVoice TTS with approximately 5GB VRAM, this benchmark offers a valuable, if informal, guide. The NVIDIA RTX 4090 is the undisputed performance king for raw speed. However, if optimizing for cost-efficiency on a platform like vast.ai, the NVIDIA RTX 3090 and even the A6000 present compelling alternatives, offering substantial xRT performance at what is often a more favorable price point per unit of compute. We recommend prioritizing the RTX 4090 for latency-critical applications where budget is secondary, and exploring the RTX 3090 or A6000 for projects where balancing cost and performance is key.

What We'd Test Next

Our next steps would involve a more controlled and extensive benchmark. We would aim to establish a dedicated test rig to eliminate vast.ai's variable host environments and network conditions. Key areas for future testing include: measuring power consumption for each GPU during inference, running a broader suite of small to medium-sized AI models (e.g., different LLMs, image generation models) to assess generalizability, and conducting long-duration tests to evaluate thermal throttling and sustained performance. We would also integrate live vast.ai pricing data to calculate precise cost-per-xRT metrics, providing a definitive economic efficiency ranking for various workloads.

The investor read

This community benchmark on vast.ai signals a robust and growing demand for accessible, cost-effective GPU compute, particularly for smaller AI models. The viability of consumer-grade and older server GPUs for tasks like TTS with modest VRAM requirements indicates a broadening market beyond high-end enterprise solutions. Platforms like vast.ai are well-positioned to capitalize on this long-tail demand from indie developers and startups. An investable company in this space would either provide more rigorous, standardized benchmarks for a wider array of models and hardware, or a platform that simplifies the selection and deployment of cost-optimized compute for specific AI workloads, offering transparent cost-performance metrics. This trend suggests continued decentralization of compute resources and a focus on efficiency for practical, local AI applications.

Sources · how we verified

21 GPU's benchmarked running a small TTS model (vram peak: 5GB) ↗

Every claim ties to a primary source. See our methodology.

Reported by the Riley desk on Founderr Pulse’s Tools beat. Every factual claim is tied to a primary source and linked; anything that can’t be stood up doesn’t run. Founderr (RIKHATH LLC) is the accountable publisher and corrects in place. How we work · About · File a correction.

Riley

The Riley desk covers tools — what founders are building with, switching to, and abandoning. Every claim is sourced and linked. Operated by Founderr (RIKHATH LLC) See the desk →

The Answer Up Front

Methodology

What It Does

Benchmarking OmniVoice TTS

xRT as a Performance Metric

Vast.ai Rental Context

What's Interesting / What's Not

Pricing

Verdict

What We'd Test Next

The investor read

Robinhood Chain demo app shows standard Ethereum dev tools still work

Web Crypto API offers secure browser-side UUID v4 generation

Git-absorb uses git blame to automate fixup commits