HomeReadTools deskclub-rdna16: Benchmarking local LLMs on 16GB AMD GPUs
Tools·May 28, 2026

club-rdna16: Benchmarking local LLMs on 16GB AMD GPUs

This review examines club-rdna16, a GitHub repository offering practical, reproducible benchmarks for running local LLMs on 16GB AMD Radeon cards, focusing on real-world performance metrics. TL;DR…

This review examines club-rdna16, a GitHub repository offering practical, reproducible benchmarks for running local LLMs on 16GB AMD Radeon cards, focusing on real-world performance metrics.

TL;DR

Best for: AMD Radeon users with 16GB GPUs (RX 6900 XT, RX 6800 XT, RX 7800 XT, RX 7900 GRE, RX 9070 XT) seeking reproducible llama.cpp performance profiles for local LLMs, especially Qwen3.6 35B-A3B. Skip if: You use Nvidia GPUs, have less than 16GB VRAM, or require a broader range of pre-benchmarked models beyond Qwen. Bottom line: club-rdna16 provides valuable, practical, and community-driven performance insights for a specific niche of local LLM enthusiasts on AMD hardware.

METHODOLOGY

This v0 review draws on the founder's published claims and detailed results available on the club-rdna16 GitHub repository and its associated GitHub Pages site. Independent benchmarks are pending. Update cadence: re-tested when claims diverge from observed behavior or when significant updates to the repository are released.

The tool under review is club-rdna16, observed on 2026-05-23. The source signal, a Reddit post by /u/do_u_think_im_spooky, links directly to the GitHub repository and its results pages. This review covers the founder's stated goals, the specific test configurations (Qwen3.6 27B and Qwen3.6 35B-A3B using Unsloth MTP GGUFs, UD-IQ3_XXS model quant, q8 KV cache), and the reported findings on an RX 6900 XT 16GB GPU running llama.cpp with ROCm/HIP. It also covers the methodology for capturing practical metrics like context length, KV cache settings, short prompt throughput, long-context retrieval, and AMD power profile impact. What's not covered in this v0 review includes independent performance verification, long-term workflow integration, or edge-case behavior beyond the reported tests.

WHAT IT DOES

club-rdna16 serves as a practical benchmarking repository for running local Large Language Models on 16GB AMD Radeon GPUs. It aims to provide reproducible llama.cpp launch profiles and performance data, moving beyond synthetic leaderboards to focus on real-world usability.

Exact llama.cpp launch profiles

The repository details the precise llama.cpp commands and configurations used for testing. This includes specific settings for model quantization, KV cache types, and context lengths, allowing other users to replicate the tests accurately on their own hardware.

Practical context length testing

Instead of theoretical maximums, club-rdna16 focuses on context lengths that actually fit and perform practically on 16GB AMD cards. It tests scenarios like 131k context with q8 KV (stable non-MTP profile) and 100k context with q8 KV and MTP, noting the careful settings required for the latter.

AMD-specific optimizations

The project highlights the impact of AMD-specific settings, such as the compute power profile, which made a real difference for long-context prefill on the RX 6900 XT. It also provides ROCm/HIP setup details, which are crucial for AMD users.

Community contribution framework

The founder encourages other 16GB Radeon card owners (RX 6900 XT, RX 6800 XT, RX 7800 XT, RX 7900 GRE, RX 9070 XT, and similar) to submit their own test results. A template is provided for useful reports, including GPU, ROCm/driver version, backend, power profile, model, model quant, KV cache type, context length, and long-context retrieval pass/fail status.

WHAT'S INTERESTING / WHAT'S NOT

What's interesting about club-rdna16 is its explicit focus on practicality over raw synthetic benchmarks. The founder, /u/do_u_think_im_spooky, directly addresses the pain points of AMD users trying to run local LLMs, providing concrete, reproducible steps rather than abstract performance numbers. The detailed llama.cpp launch profiles and specific context length tests (131k, 100k) with KV cache settings are invaluable for users trying to optimize their setups. The observation that the AMD compute power profile significantly impacts long-context prefill is a specific, actionable insight that would be difficult to discover without dedicated testing.

The project's emphasis on community contribution, with clear guidelines for result submissions, is also a strong point. This approach can build a much-needed knowledge base for AMD users, who often face more fragmentation and less direct support compared to Nvidia users in the local LLM space. The finding that Qwen3.6 35B-A3B performs stronger than Qwen3.6 27B on the RX 6900 XT for practical use cases is a valuable recommendation.

What's not interesting, or rather, what's a limitation, is the current narrow scope of models tested (primarily Qwen3.6 27B/35B-A3B). While this focus allows for deep, specific insights, users interested in other model architectures or sizes will need to conduct their own tests or wait for community contributions. The project's current reliance on a single test machine (RX 6900 XT) means that performance on other 16GB AMD cards is extrapolated or reliant on future community data. While the methodology is sound, the initial data set is limited to the founder's specific hardware and model choices.

PRICING

N/A. club-rdna16 is an open-source GitHub repository and associated GitHub Pages site, available at no cost. Pricing snapshot: 2026-05-23.

VERDICT

club-rdna16 is an essential resource for developers and enthusiasts running local LLMs on 16GB AMD Radeon GPUs. It directly addresses the challenges of llama.cpp setup and optimization on AMD hardware, providing reproducible configurations and practical performance insights. The project's focus on real-world context lengths, KV cache settings, and the impact of AMD power profiles makes it highly valuable. While currently focused on specific Qwen models and the RX 6900 XT, its community-driven approach promises to expand its utility. For anyone with a compatible AMD card, this repository offers a clear starting point and a platform for shared knowledge, moving beyond scattered forum comments to a centralized, data-backed comparison.

WHAT WE'D TEST NEXT

Our next steps would involve independently replicating the reported benchmarks on an RX 6900 XT to verify the founder's claims regarding context length stability and throughput. We would also expand testing to other 16GB AMD cards mentioned (RX 6800 XT, RX 7800 XT, RX 7900 GRE) to assess performance consistency across the RDNA family. Further tests would include a broader range of popular GGUF models, beyond Qwen, to understand their compatibility and performance characteristics on these GPUs. Investigating the long-term stability and resource usage during extended inference sessions, especially with varying prompt lengths and user loads, would also be a priority. Finally, we would explore the impact of different ROCm/HIP versions and driver updates on overall performance and stability.

Sources · how we verified
  1. club-rdna16: practical 16GB AMD/Radeon local LLM testing repo

Every claim ties to a primary source. See our methodology.

Reported by the Riley desk on Founderr Pulse’s Tools beat. Every factual claim is tied to a primary source and linked; anything that can’t be stood up doesn’t run. Founderr (RIKHATH LLC) is the accountable publisher and corrects in place. How we work · About · File a correction.
R
Riley

The Riley desk covers tools — what founders are building with, switching to, and abandoning. Every claim is sourced and linked. Operated by Founderr (RIKHATH LLC) See the desk →

Founderr Pulse — free & independent. The desk for people who build & back.