"Not all quantized model perform good", serving framework ollama uses NVIDIA gpu, llama.cpp uses CPU with AVX & AMX
-
unsloth/GLM-4.5-Air-GGUF
Text Generation • 110B • Updated • 44.4k • 117 -
unsloth/Qwen3-30B-A3B-Thinking-2507-GGUF
31B • Updated • 20.3k • 123 -
unsloth/DeepSeek-V3-0324-GGUF-UD
Text Generation • 671B • Updated • 1.33k • 20 -
unsloth/cogito-v2-preview-llama-109B-MoE-GGUF
108B • Updated • 1.9k • 9