Tools·May 20, 2026

Whichllm's Accuracy for Local LLMs on Low VRAM

This review examines whichllm's utility in recommending local LLMs for developers with 4-6GB vRAM, assessing its reported accuracy and resource detection capabilities based on user feedback. TL;DR…

By Riley · Tools desk·Human-reviewed·✓ Verified May 20, 2026·4 min read·1 source

This review examines whichllm's utility in recommending local LLMs for developers with 4-6GB vRAM, assessing its reported accuracy and resource detection capabilities based on user feedback.

TL;DR

Best for: Quick, initial model compatibility checks for common hardware configurations. Skip if: You require precise vRAM, RAM, or disk estimates, or if you operate in virtualized environments like WSL. Bottom line: whichllm offers a useful starting point for local LLM exploration, but its recommendations and resource assessments demand independent verification, especially under specific hardware constraints.

METHODOLOGY

This v0 review of whichllm draws on the founder eightshone's published claims and observations within a Reddit post. The signal, titled "How accurate can “whichllm” be?", was ingested on 2026-05-20T12:00:06.279Z. This review covers eightshone's reported experience using whichllm to identify suitable local LLMs for a development laptop with 4-6GB of vRAM. Specifically, we analyze the user's surprise at certain model recommendations (gpt-oss-20b and qwen3.6-27b) and their observation that whichllm's RAM and free disk capacity estimates were incorrect, potentially due to running Linux inside WSL. What's not covered in this v0 review includes independent performance benchmarks, long-term workflow integration, or comprehensive testing of whichllm across various hardware and software configurations. We acknowledge this limitation; whichllm will be re-tested when founder claims or observed behavior diverge from this initial assessment.

WHAT IT DOES

whichllm appears to be a tool designed to help developers identify local large language models compatible with their machine's specifications. Its primary function, as described by eightshone, involves two key areas:

System resource detection

The tool attempts to automatically detect the host machine's hardware capabilities, including vRAM, RAM, and free disk space. eightshone noted that while whichllm provided a list of models, its reported ram and free disk capacities are incorrect. This inaccuracy was speculated to be a consequence of running Linux inside WSL, suggesting whichllm may struggle with accurate resource detection in virtualized or containerized environments.

Model recommendation engine

Based on its detected system resources, whichllm generates a list of local LLMs it deems compatible. eightshone, who works on internal CLI tools and has experience with qwen2.5-coder-instruct 3b on 4-6GB vRAM, found that most of the list makes sense. However, they expressed surprise at seeing gpt-oss-20b and qwen3.6-27b included in the recommendations. These models typically require significantly more vRAM than the 4-6GB available, indicating a potential miscalculation or overestimation of compatibility by the tool for higher-parameter models.

WHAT'S INTERESTING / WHAT'S NOT

What's interesting about whichllm is the concept itself. The local LLM ecosystem is complex, with a dizzying array of models, quantization levels, and hardware requirements. A tool that simplifies this selection process by automatically assessing system capabilities and recommending compatible models is genuinely valuable. For developers new to local LLMs, or those with specific hardware constraints like eightshone's 4-6GB vRAM, a reliable whichllm could significantly lower the barrier to entry. The fact that eightshone found most of the list makes sense suggests the tool has a foundational understanding of compatibility for some models, pointing to a potentially useful filtering mechanism.

What's not interesting, and indeed concerning, is the reported lack of accuracy in whichllm's core functions. The tool's inability to correctly report RAM and free disk capacities, especially within a common development setup like WSL, undermines its credibility. If the foundational resource detection is flawed, then the model recommendations built upon those detections are inherently suspect. The inclusion of high-vRAM models like gpt-oss-20b and qwen3.6-27b for a machine with only 4-6GB vRAM is a significant red flag. This suggests whichllm either uses an overly optimistic estimation model or fails to account for practical vRAM limitations, leading to recommendations that are likely unrunnable. This discrepancy means users cannot fully trust the tool's output without extensive manual verification, diminishing its primary value proposition.

PRICING

whichllm is not explicitly priced in the source signal. Based on its context within a community discussion about local LLMs and the absence of any commercial mentions, we assume it is a free, open-source tool or a free web service. (Pricing snapshot: 2026-05-20).

VERDICT

whichllm serves as a valuable initial filter for developers navigating the complex landscape of local LLMs, particularly when exploring options for constrained hardware. Its core promise of simplifying model selection is compelling. However, its current accuracy issues, specifically the reported incorrect resource detection in environments like WSL and the surprising recommendations of high-vRAM models for low-vRAM machines, mean it cannot be relied upon as a definitive source of truth. For developers like eightshone working with 4-6GB vRAM on a laptop, whichllm's output should be treated as a starting point, requiring significant skepticism and manual validation against actual model requirements and system capabilities. Use it to generate ideas, but verify every suggestion.

WHAT WE'D TEST NEXT

Our next steps would involve a rigorous benchmarking process. We would reproduce eightshone's exact setup, including a machine with 4-6GB vRAM running Linux inside WSL, and execute whichllm. We would then cross-reference whichllm's reported RAM, vRAM, and disk capacities against direct system commands like free -h, df -h, and nvidia-smi. A critical test would involve attempting to load the

Sources · how we verified

How accurate can “whichllm” be? ↗

Every claim ties to a primary source. See our methodology.

Reported by the Riley desk on Founderr Pulse’s Tools beat. Every factual claim is tied to a primary source and linked; anything that can’t be stood up doesn’t run. Founderr (RIKHATH LLC) is the accountable publisher and corrects in place. How we work · About · File a correction.

Riley

The Riley desk covers tools — what founders are building with, switching to, and abandoning. Every claim is sourced and linked. Operated by Founderr (RIKHATH LLC) See the desk →

TL;DR

METHODOLOGY

WHAT IT DOES

System resource detection

Model recommendation engine

WHAT'S INTERESTING / WHAT'S NOT

PRICING

VERDICT

WHAT WE'D TEST NEXT

Robinhood Chain demo app shows standard Ethereum dev tools still work

Web Crypto API offers secure browser-side UUID v4 generation

Git-absorb uses git blame to automate fixup commits