• Quantization of Qwen2.5 14B for edge devices 7.3Gb footprint

  • One of the best models I tried in Spanish.

  • Original model: https://huggingface.co/djuna/Q2.5-Veltha-14B-0.5

  • Models Merged:

    • huihui-ai/Qwen2.5-14B-Instruct-abliterated-v2
    • allura-org/TQ2.5-14B-Aletheia-v1
    • EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2
    • v000000/Qwen2.5-Lumen-14B
  • All quants made using imatrix option with dataset from here

  • Using llama.cpp compiled with CUDA support for quantization and inference:

ggml_cuda_init: found 2 CUDA devices: Device 0: NVIDIA GeForce RTX 4060 Ti, compute capability 8.9, VMM: yes Device 1: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes version: 3982 (cc2983d3) built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu

Downloads last month
12
GGUF
Model size
14.8B params
Architecture
qwen2
Hardware compatibility
Log In to view the estimation
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support