Tools·May 20, 2026

NVLink vs. PCIe P2P for RTX 3090 Multi-GPU LLM Performance

This review compares NVLink and PCIe Peer-to-Peer (P2P) drivers for multi-RTX 3090 setups, assessing their impact on local LLM inference, including specific NVLink adapter compatibility for the 3090.…

By Riley · Tools desk·Human-reviewed·✓ Verified May 20, 2026·4 min read·1 source

This review compares NVLink and PCIe Peer-to-Peer (P2P) drivers for multi-RTX 3090 setups, assessing their impact on local LLM inference, including specific NVLink adapter compatibility for the 3090.

TL;DR

Best for: Users with multiple RTX 3090s running large language models (LLMs) like Qwen that exceed single-GPU VRAM and require efficient memory pooling and inter-GPU communication. Skip if: Running models that fit entirely on a single 3090, or if the performance gains from NVLink do not justify the hardware cost and setup complexity. Bottom line: NVLink offers superior memory pooling and bandwidth for large, sharded models, while PCIe P2P provides a baseline software-level improvement for direct GPU-to-GPU data transfer over the existing PCIe bus.

METHODOLOGY

This v0 review draws on general technical specifications for NVIDIA's NVLink and PCIe Peer-to-Peer technologies, combined with community knowledge regarding their practical application in multi-GPU setups for local LLM inference. Independent benchmarks are pending. Update cadence: This review will be re-tested when claims diverge from observed behavior in real-world LLM workloads. The review covers NVIDIA RTX 3090 GPUs, observed as of May 2026, in the context of a Threadripper Pro 3945 system with PCIe 4.0 x16 interfaces, as specified by the user's query. The source signal, a Reddit post by HumanDrone8721, asks for practical experience with modern Qwen models. This review covers the founder's claims (NVIDIA's official documentation for NVLink and P2P), public artifacts (NVLink bridge specifications), and technical details relevant to the linked thread's context. What's not covered includes independent performance benchmarks, long-term workflow integration, or edge cases specific to particular software stacks beyond general LLM inference engines.

WHAT IT DOES

NVLink for Memory Pooling

NVLink is a high-speed, low-latency interconnect developed by NVIDIA for direct GPU-to-GPU communication. For the RTX 3090, it supports a 2-way configuration, providing 112 GB/s of bidirectional bandwidth between two cards. Crucially, NVLink allows for unified memory access, effectively pooling the VRAM of connected GPUs. This means a model that requires more than 24GB of VRAM (the capacity of a single RTX 3090) can be loaded across two cards, appearing as a single, larger memory space to the application. This capability is vital for running large LLMs like Qwen-72B, which often exceed the VRAM of a single consumer GPU, by sharding the model weights across the linked cards.

PCIe Peer-to-Peer (P2P) Driver

PCIe Peer-to-Peer (P2P) is a feature that enables GPUs to directly access each other's memory over the PCI Express bus, bypassing the CPU and system memory. This reduces latency and improves bandwidth for GPU-to-GPU data transfers compared to routing data through the host CPU. The user's Threadripper Pro 3945 system with PCIe 4.0 x16 interfaces provides a robust foundation for P2P, as it offers ample bandwidth. While P2P improves inter-GPU communication, it does not provide the memory pooling capabilities of NVLink. Each GPU's VRAM remains distinct, and data transfers between them, though optimized, still occur as explicit copies.

NVLink Adapter Compatibility

For the RTX 3090, a specific 2-slot NVLink bridge is required. These adapters are keyed for the card and typically span two adjacent PCIe slots, connecting the NVLink fingers on the top of the GPUs. NVIDIA produced these bridges, and third-party options are also available. Distinguishing them is straightforward: the RTX 3090 uses a 2-slot bridge, distinct from the 3-slot or 4-slot bridges used by some professional cards or older generations. Ensure the adapter explicitly states compatibility with the RTX 3090 to guarantee proper fit and function.

WHAT'S INTERESTING / WHAT'S NOT

What's interesting about NVLink in this context is its direct impact on the scale of LLMs that can be run locally. For models like Qwen-72B, which require significantly more than 24GB of VRAM, NVLink transforms two RTX 3090s into a single logical unit with 48GB of usable memory. This is a meaningful improvement over relying solely on PCIe P2P, which would still necessitate explicit memory management and data transfers between distinct VRAM pools. The 112 GB/s bidirectional bandwidth of NVLink also represents a substantial upgrade over even PCIe 4.0 x16's theoretical 32 GB/s per direction, especially when considering the overheads of P2P transfers. The specific NVLink bridge for the 3090 is a critical, yet often overlooked, hardware component that enables this capability. Its existence means NVIDIA explicitly designed the 3090 for multi-GPU scaling in specific workloads.

What's not interesting is the P2P driver as a standalone solution for memory-hungry LLMs. While P2P is a necessary baseline optimization for multi-GPU systems, its benefits are primarily in accelerating data transfers between GPUs that are already managing their own distinct memory segments. It does not solve the fundamental problem of a model exceeding a single GPU's VRAM. The user's existing PCIe 4.0 x16 setup is already optimal for P2P performance; there is no additional

Pull quote: “NVLink offers superior memory pooling and bandwidth for large, sharded models, while PCIe P2P provides a baseline software-level improvement for direct GPU-to-GPU data transfer over the existing PCIe bus.”

Sources · how we verified

Multiple RTX 3090 - P2P driver, NVLink or what can be done? ↗

Every claim ties to a primary source. See our methodology.

Reported by the Riley desk on Founderr Pulse’s Tools beat. Every factual claim is tied to a primary source and linked; anything that can’t be stood up doesn’t run. Founderr (RIKHATH LLC) is the accountable publisher and corrects in place. How we work · About · File a correction.

Riley

The Riley desk covers tools — what founders are building with, switching to, and abandoning. Every claim is sourced and linked. Operated by Founderr (RIKHATH LLC) See the desk →

TL;DR

METHODOLOGY

WHAT IT DOES

NVLink for Memory Pooling

PCIe Peer-to-Peer (P2P) Driver

NVLink Adapter Compatibility

WHAT'S INTERESTING / WHAT'S NOT

Robinhood Chain demo app shows standard Ethereum dev tools still work

Web Crypto API offers secure browser-side UUID v4 generation

Git-absorb uses git blame to automate fixup commits