Quantization of Qwen2.5 14B for edge devices 7.3Gb footprint
One of the best models I tried in Spanish.
Original model: https://huggingface.co/djuna/Q2.5-Veltha-14B-0.5
Models Merged:
- huihui-ai/Qwen2.5-14B-Instruct-abliterated-v2
- allura-org/TQ2.5-14B-Aletheia-v1
- EVA-UNIT-01/EVA-Qwen2.5-14B-v0.2
- v000000/Qwen2.5-Lumen-14B
All quants made using imatrix option with dataset from here
Using llama.cpp compiled with CUDA support for quantization and inference:

ggml_cuda_init: found 2 CUDA devices: Device 0: NVIDIA GeForce RTX 4060 Ti, compute capability 8.9, VMM: yes Device 1: NVIDIA GeForce RTX 3060, compute capability 8.6, VMM: yes version: 3982 (cc2983d3) built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu

Downloads last month: 4

GGUF

Model size

14.8B params

Architecture

qwen2

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support