"Not all quantized model perform good", serving framework ollama uses NVIDIA gpu, llama.cpp uses CPU with AVX & AMX
-
unsloth/GLM-4.5-Air-GGUF
Text Generation • 110B • Updated • 107k • 72 -
unsloth/Qwen3-30B-A3B-Thinking-2507-GGUF
31B • Updated • 89.5k • 82 -
unsloth/DeepSeek-V3-0324-GGUF-UD
Text Generation • 671B • Updated • 4.87k • 18 -
unsloth/cogito-v2-preview-llama-109B-MoE-GGUF
108B • Updated • 13.4k • 8