"Not all quantized model perform good", serving framework ollama uses NVIDIA gpu, llama.cpp uses CPU with AVX & AMX
-
unsloth/GLM-4.5-Air-GGUF
Text Generation • 110B • Updated • 41.6k • 121 -
unsloth/Qwen3-30B-A3B-Thinking-2507-GGUF
31B • Updated • 18.5k • 123 -
unsloth/DeepSeek-V3-0324-GGUF-UD
Text Generation • 671B • Updated • 2.72k • 20 -
unsloth/cogito-v2-preview-llama-109B-MoE-GGUF
108B • Updated • 1.75k • 9