MMLU Pro benchmark for GGUFs (1 shot) "Not all quantized model perform good", serving framework ollama uses NVIDIA gpu, llama.cpp uses CPU with AVX & AMX unsloth/GLM-4.5-Air-GGUF Text Generation • 110B • Updated Aug 5 • 62.9k • 94 unsloth/Qwen3-30B-A3B-Thinking-2507-GGUF 31B • Updated Jul 31 • 41.5k • 101 unsloth/DeepSeek-V3-0324-GGUF-UD Text Generation • 671B • Updated Apr 28 • 3.23k • 18 unsloth/cogito-v2-preview-llama-109B-MoE-GGUF 108B • Updated Jul 31 • 1.6k • 9
OCR Models numind/NuMarkdown-8B-Thinking Image-to-Text • 8B • Updated Aug 20 • 8.59k • 210 rednote-hilab/dots.ocr Image-Text-to-Text • 3B • Updated 1 day ago • 301k • 981 microsoft/kosmos-2.5-chat Image-Text-to-Text • 1B • Updated 27 days ago • 1.68k • 22
MMLU Pro benchmark for GGUFs (1 shot) "Not all quantized model perform good", serving framework ollama uses NVIDIA gpu, llama.cpp uses CPU with AVX & AMX unsloth/GLM-4.5-Air-GGUF Text Generation • 110B • Updated Aug 5 • 62.9k • 94 unsloth/Qwen3-30B-A3B-Thinking-2507-GGUF 31B • Updated Jul 31 • 41.5k • 101 unsloth/DeepSeek-V3-0324-GGUF-UD Text Generation • 671B • Updated Apr 28 • 3.23k • 18 unsloth/cogito-v2-preview-llama-109B-MoE-GGUF 108B • Updated Jul 31 • 1.6k • 9
OCR Models numind/NuMarkdown-8B-Thinking Image-to-Text • 8B • Updated Aug 20 • 8.59k • 210 rednote-hilab/dots.ocr Image-Text-to-Text • 3B • Updated 1 day ago • 301k • 981 microsoft/kosmos-2.5-chat Image-Text-to-Text • 1B • Updated 27 days ago • 1.68k • 22