merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct - GGUF Quantized Model

This is a collection of GGUF quantized versions of pravdin/merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct.

🌳 Model Tree

This model was created by merging the following models:

pravdin/merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct
β”œβ”€β”€ Merge Method: dare_ties
β”œβ”€β”€ context-labs/meta-llama-Llama-3.2-3B-Instruct-FP16
└── unsloth/Llama-3.2-3B-Instruct
    β”œβ”€β”€ density: 0.6
    β”œβ”€β”€ weight: 0.5

Merge Method: DARE_TIES - Advanced merging technique that reduces interference between models

πŸ“Š Available Quantization Formats

This repository contains multiple quantization formats optimized for different use cases:

  • q4_k_m: 4-bit quantization, medium quality, good balance of size and performance
  • q5_k_m: 5-bit quantization, higher quality, slightly larger size
  • q8_0: 8-bit quantization, highest quality, larger size but minimal quality loss

πŸš€ Usage

With llama.cpp

# Download a specific quantization
wget https://huggingface.co/pravdin/merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct/resolve/main/merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct.q4_k_m.gguf

# Run with llama.cpp
./main -m merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct.q4_k_m.gguf -p "Your prompt here"

With Python (llama-cpp-python)

from llama_cpp import Llama

# Load the model
llm = Llama(model_path="merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct.q4_k_m.gguf")

# Generate text
output = llm("Your prompt here", max_tokens=512)
print(output['choices'][0]['text'])

With Ollama

# Create a Modelfile
echo 'FROM ./merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct.q4_k_m.gguf' > Modelfile

# Create and run the model
ollama create merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct -f Modelfile
ollama run merged-context-labs-meta-llama-Llama-3.2-3B-Instruct-FP16-unsloth-Llama-3.2-3B-Instruct "Your prompt here"

πŸ“‹ Model Details

🎯 Recommended Usage

  • q4_k_m: Best for most use cases, good quality/size trade-off
  • q5_k_m: When you need higher quality and have more storage/memory
  • q8_0: When you want minimal quality loss from the original model

⚑ Performance Notes

GGUF models are optimized for:

  • Faster loading times
  • Lower memory usage
  • CPU and GPU inference
  • Cross-platform compatibility

For best performance, ensure your hardware supports the quantization format you choose.


This model was automatically quantized using the Lemuru LLM toolkit.

Downloads last month
13
GGUF
Model size
3.21B params
Architecture
llama
Hardware compatibility
Log In to view the estimation

4-bit

5-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support