最終更新:2026-02-26 (木) 22:18:37 (10d)
Qwen3.5-35B-A3B
Top / Qwen3.5-35B-A3B
https://huggingface.co/Qwen/Qwen3.5-35B-A3B
LM Studio
qwen/qwen3.5-35b-a3b
https://lmstudio.ai/models/qwen/qwen3.5-35b-a3b
- ThinkingのOn/Offが可能
Unsloth
lmstudio-community
https://huggingface.co/lmstudio-community/Qwen3.5-27B-GGUF
mlx-community
Unsloth
- GeForce RTX 3090で40レイヤをGPUにロードするとgeneration failedになることがある (85tok/sくらい出るが多分途中でVRAM不足)
- 38レイヤだとOK (26tok/s) -> VRAM 21.7GB
Thinkingのオンオフ
LM Studioで
https://lmstudio.ai/models/qwen/qwen3.5-35b-a3b にはThinkigのオンオフボタンが作っぽい
- lmstudio-communityとUnslothのは対応してないぽい
https://www.reddit.com/r/LocalLLaMA/comments/1re1b4a/you_can_use_qwen35_without_thinking/
llama.cpp
https://unsloth.ai/docs/models/qwen3.5#qwen3.5-inference-tutorials
- To disable thinking / reasoning, use --chat-template-kwargs "{\"enable_thinking\": false}"
メモ
GeForce RTX 3090 Unsloth Q4_K_XL? (40/40) 21.5GB 85tok/s (LM Studio) GeForce RTX 3090 Unsloth Q4_K_XL? (38/40) 21.5GB 26tok/s (LM Studio) Apple M1 Ultra Unsloth Q4_K_XL? 21.5GB 42tok/s (LM Studio) Apple M1 Ultra MLX Community 4bit 20.4GB 59.95tok/s (LM Studio) GeForce RTX 3090 Unsloth Q4_K_XL? 21.5GB 115tok/s (llama-bench) GeForce RTX 5090 Unsloth Q4_K_XL? 21.5GB 170tok/s (llama-bench) Apple M2 Ultra Unsloth Q4_K_XL? 21.5GB 52tok/s (llama-bench) https://x.com/k12u/status/2027006837644726407
https://x.com/gosrum/status/2026457860197253504
比較
https://x.com/Alibaba_Qwen/status/2026339351530188939
- Qwen3-235B-A22B-2507?およびQwen3-VL-235B-A22B?を上回っている

