merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct - GGUF Quantized Model
This is a collection of GGUF quantized versions of pravdin/merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct.
π³ Model Tree
This model was created by merging the following models:
pravdin/merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct
βββ Merge Method: dare_ties
βββ context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16
βββ unsloth/Llama-3.2-3B-Instruct
βββ density: 0.6
βββ weight: 0.5
Merge Method: DARE_TIES - Advanced merging technique that reduces interference between models
π Available Quantization Formats
This repository contains multiple quantization formats optimized for different use cases:
- q4_k_m: 4-bit quantization, medium quality, good balance of size and performance
- q5_k_m: 5-bit quantization, higher quality, slightly larger size
- q8_0: 8-bit quantization, highest quality, larger size but minimal quality loss
π Usage
With llama.cpp
# Download a specific quantization
wget https://huggingface.co/pravdin/merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct/resolve/main/merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct.q4_k_m.gguf
# Run with llama.cpp
./main -m merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct.q4_k_m.gguf -p "Your prompt here"
With Python (llama-cpp-python)
from llama_cpp import Llama
# Load the model
llm = Llama(model_path="merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct.q4_k_m.gguf")
# Generate text
output = llm("Your prompt here", max_tokens=512)
print(output['choices'][0]['text'])
With Ollama
# Create a Modelfile
echo 'FROM ./merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct.q4_k_m.gguf' > Modelfile
# Create and run the model
ollama create merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct -f Modelfile
ollama run merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct "Your prompt here"
π Model Details
- Original Model: pravdin/merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct
- Quantization Tool: llama.cpp
- License: Same as original model
- Use Cases: Optimized for local inference, edge deployment, and resource-constrained environments
π― Recommended Usage
- q4_k_m: Best for most use cases, good quality/size trade-off
- q5_k_m: When you need higher quality and have more storage/memory
- q8_0: When you want minimal quality loss from the original model
β‘ Performance Notes
GGUF models are optimized for:
- Faster loading times
- Lower memory usage
- CPU and GPU inference
- Cross-platform compatibility
For best performance, ensure your hardware supports the quantization format you choose.
This model was automatically quantized using the Lemuru LLM toolkit.
- Downloads last month
- 13
Hardware compatibility
Log In
to view the estimation
4-bit
5-bit
8-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support