Llama-3.2-3B-Instruct-f32-GGUF

Llama 3.2 3B Instruct by Meta is a lightweight, instruction-tuned large language model with 3.21 billion parameters, designed for efficient text-only, multilingual dialogue applications such as agentic retrieval, summarization, and instruction following. It employs an optimized transformer architecture with supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) to align with human-like helpfulness and safety. Boasting an impressive context length capacity of up to 128,000 tokens, it supports multiple languages including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.

The model is tailored for on-device and edge deployment, enabling fast, private, and low-latency inference while maintaining high-quality performance on benchmarks, making it suitable for mobile and resource-constrained environments. Llama 3.2 3B Instruct excels in text generation, rewriting, and dialogue tasks, and represents a robust solution for building advanced AI assistants with strong multilingual and instruction-following capabilities.

Model Files

Model File name Size QuantType
Llama-3.2-3B-Instruct.BF16.gguf 6.43 GB BF16
Llama-3.2-3B-Instruct.F16.gguf 6.43 GB F16
Llama-3.2-3B-Instruct.F32.gguf 12.9 GB F32
Llama-3.2-3B-Instruct.Q2_K.gguf 1.36 GB Q2_K
Llama-3.2-3B-Instruct.Q3_K_L.gguf 1.82 GB Q3_K_L
Llama-3.2-3B-Instruct.Q3_K_M.gguf 1.69 GB Q3_K_M
Llama-3.2-3B-Instruct.Q3_K_S.gguf 1.54 GB Q3_K_S
Llama-3.2-3B-Instruct.Q4_K_M.gguf 2.02 GB Q4_K_M
Llama-3.2-3B-Instruct.Q4_K_S.gguf 1.93 GB Q4_K_S
Llama-3.2-3B-Instruct.Q5_K_M.gguf 2.32 GB Q5_K_M
Llama-3.2-3B-Instruct.Q5_K_S.gguf 2.27 GB Q5_K_S
Llama-3.2-3B-Instruct.Q6_K.gguf 2.64 GB Q6_K
Llama-3.2-3B-Instruct.Q8_0.gguf 3.42 GB Q8_0

Quants Usage

(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)

Here is a handy graph by ikawrakow comparing some lower-quality quant types (lower is better):

image.png

Downloads last month
164
GGUF
Model size
3.21B params
Architecture
llama
Hardware compatibility
Log In to view the estimation

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

32-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for prithivMLmods/Llama-3.2-3B-Instruct-f32-GGUF

Quantized
(382)
this model