Tools·May 29, 2026

Qwen3.5 35B A3B: Native MTP preserved for general AI assistance

This review examines LLMFan46's Qwen3.5 35B A3B model, focusing on its claimed strengths for general AI tasks, MTP preservation, and resilience to quantization-induced KL divergence for local…

By Riley · Tools desk·Human-reviewed·✓ Verified May 29, 2026·3 min read·7 sources

This review examines LLMFan46's Qwen3.5 35B A3B model, focusing on its claimed strengths for general AI tasks, MTP preservation, and resilience to quantization-induced KL divergence for local deployment.

TL;DR

Best for: Indie founders and developers seeking a 35B parameter local LLM optimized for general-purpose AI assistance, especially when prioritizing resilience to quantization effects. Skip if: Your primary use case involves agentic workflows or complex coding tasks, where Qwen3.6 models are claimed to be more optimal. Bottom line: LLMFan46's Qwen3.5-35B-A3B offers a specialized, uncensored option for local general AI, with a notable claim of higher KL divergence tolerance before significant accuracy loss.

METHODOLOGY

This is a v0 review, drawing on the founder LLMFan46's published claims on Reddit and the associated HuggingFace model cards. The review covers the technical specifications, claimed performance characteristics, and intended use cases as presented in the source signal. We accessed the Reddit post on 2026-05-26 and the linked HuggingFace repositories. What's covered includes the model's version (Qwen3.5-35B-A3B-uncensored-heretic-v2-Native-MTP-Preserved), its availability in various quantization formats (Safetensors, GGUFs, NVFP4, GPTQ-Int4), and the author's explicit comparison of Qwen3.5 and Qwen3.6 models for different primary use cases. We also analyze the founder's claims regarding KL divergence and accuracy loss. What's not covered in this v0 review includes independent performance benchmarks, long-term workflow integration, or edge-case behavior. Independent benchmarks are pending, and our update cadence will re-test when claims diverge from observed behavior.

WHAT IT DOES

Native MTP preservation

LLMFan46's Qwen3.5-35B-A3B model is presented as an "uncensored heretic" variant of the Qwen3.5 architecture. A core feature highlighted is the preservation of "Full 785 MTPs Preserved and Retained." While "MTP" is not explicitly defined in the source, context suggests it refers to Multi-Turn Preservation, a critical aspect for maintaining coherence and context over extended conversational interactions. This preservation is intended to ensure the model's quality and consistency in multi-turn dialogues.

Tailored use cases

The author, LLMFan46, explicitly differentiates the optimal use cases for Qwen3.5 and Qwen3.6 models, despite both sharing the qwen35 architecture. Qwen3.5 models, including this 35B A3B variant, are primarily positioned for general purpose AI assistance. In contrast, Qwen3.6 models are described as mainly for agentic and coding AI assistance. While cross-usage is possible, the founder claims optimal performance aligns with these specific distinctions.

Quantization formats

To facilitate local deployment, the Qwen3.5-35B-A3B model is made available in a comprehensive range of quantized formats. These include full precision Safetensors, GGUFs, NVFP4, NVFP4 GGUFs, and GPTQ-Int4. This broad availability allows developers to choose the format best suited for their specific hardware and inference stack, balancing model size, memory footprint, and inference speed for local execution.

KL divergence resilience

A significant technical claim by LLMFan46 is the Qwen3.5 architecture's distinct behavior regarding KL divergence and accuracy loss. The founder states that Qwen3.5 models can exhibit a KL divergence in the 300s or 400s without a substantial loss of accuracy on benchmarks. This contrasts with Qwen3.6 models, where a KL divergence in the 400s or higher could indicate a disastrous loss of accuracy. For instance, the Qwen3.5-35B-A3B model is claimed to have a KL divergence of 0.0487 with an accuracy loss of 0.40%, while a Qwen3.6-35B-A3B model had a KL divergence of 0.0015 with an accuracy loss of 0.32%.

WHAT'S INTERESTING / WHAT'S NOT

What's interesting about LLMFan46's Qwen3.5-35B-A3B is the explicit, data-backed claim differentiating Qwen3.5 and Qwen3.6 for specific use cases. The idea that two models sharing the same base architecture can be optimally tuned for

Sources · how we verified

Every claim ties to a primary source. See our methodology.

Reported by the Riley desk on Founderr Pulse’s Tools beat. Every factual claim is tied to a primary source and linked; anything that can’t be stood up doesn’t run. Founderr (RIKHATH LLC) is the accountable publisher and corrects in place. How we work · About · File a correction.

Riley

The Riley desk covers tools — what founders are building with, switching to, and abandoning. Every claim is sourced and linked. Operated by Founderr (RIKHATH LLC) See the desk →

TL;DR

METHODOLOGY

WHAT IT DOES

Native MTP preservation

Tailored use cases

Quantization formats

KL divergence resilience

WHAT'S INTERESTING / WHAT'S NOT

Robinhood Chain demo app shows standard Ethereum dev tools still work

Web Crypto API offers secure browser-side UUID v4 generation

Git-absorb uses git blame to automate fixup commits