3d ago

Unsloth AI releases MTP-optimized GGUF files for Qwen3.6-27B and Qwen3.6-35B-A3B on Hugging Face delivering 1.4 to 2.2 times faster generation

llama.cpp merged native MTP support on May 16 for Qwen3.6 models.

0
Original post

Qwen3.6 now runs 2x faster with MTP GGUFs! Run locally on just 18GB RAM. ⚡️ MTP enables Qwen3.6 to generate ~1.4–2.2× faster with no accuracy change. Qwen3.6-27B MTP runs at 160 tokens/s. 35B-A3B reaches 240 t/s. GGUFs: https://huggingface.co/unsloth/Qwen3.6-27B-MTP-GGUF Guide: https://unsloth.ai/docs/models/qwen3.6#mtp-guide

6:40 AM · May 18, 2026 View on X
Reposted by

I've seen some confusion online on how to run llama.cpp with MTP (Multi-token prediction) in the simplest way possible.

ICYMI, MTP is a new flavor of speculative decoding built-in to the model itself, that ~2x your tokens per sec for most use cases.

2x generation speed = Truly a game changer. 🔥

How to run it?

brew upgrade llama.cpp # or you might need to install from source until build 9200 is in your package manager: brew install llama.cpp --HEAD

Then pick either the Dense 27B or the 35B A3B MoE.

Personally I tend to stick to the Dense model where I achieve ~30 tok/sec on my machine. The MoE is of course way faster at an impressive ~100 tok/sec on my machine. Truly rapid. ⚡️

In both cases you probably want 48GB or better 64GB RAM or VRAM, though 36GB might work with more strongly-quantized versions.

# Dense:

llama-server -hf ggml-org/Qwen3.6-27B-MTP-GGUF --spec-type draft-mtp --spec-draft-n-max 2

# MoE:

llama-server -hf ggml-org/Qwen3.6-35B-A3B-MTP-GGUF --spec-type draft-mtp --spec-draft-n-max 3

Enjoy!

12:59 PM · May 19, 2026 · 17.8K Views

finally faster Qwen3.6 models with MTP support ⚡️

brb updating my Pi & Hermes setup 🤝

Georgi GerganovGeorgi Gerganov@ggerganov

llama.cpp adds MTP for the Qwen3.6 family This is a significant milestone for the local AI ecosystem. The performance jump with these changes is massive and elevates local inference on commodity hardware further. Special thanks to Aman Gupta for leading this development! https://github.com/ggml-org/llama.cpp/pull/22673

3:07 PM · May 18, 2026 · 252.2K Views
6:59 PM · May 18, 2026 · 4.9K Views

llama.cpp adds MTP for the Qwen3.6 family

This is a significant milestone for the local AI ecosystem. The performance jump with these changes is massive and elevates local inference on commodity hardware further.

Special thanks to Aman Gupta for leading this development!

github.com
/ggml-org/llama.cpp/pull/22673
3:07 PM · May 18, 2026 · 252.2K Views
Unsloth AI releases MTP-optimized GGUF files for Qwen3.6-27B and Qwen3.6-35B-A3B on Hugging Face delivering 1.4 to 2.2 times faster generation · Digg