kanrishaurus
/

llama3-8b-sahabatai-v1-instruct-GGUF

Text Generation

Model card Files Files and versions

Llama 3 8B Sahabat-AI Instruct (GGUF Versions)

This repository contains GGUF converted and quantized versions of the Sahabat-AI/llama3-8b-cpt-sahabatai-v1-instruct model, converted using llama.cpp.

This model is an instruction-tuned variant, suitable for chat and following commands.

Available GGUF Files:

1. `llama3-8b-cpt-sahabatai-v1-instruct-f16.gguf`

Format: FP16 (Full Precision)
Size: ~16.1 GB
Description: This is the full-precision GGUF conversion. It offers the highest fidelity but requires significant VRAM (approx. 16 GB).

2. `llama3-8b-cpt-sahabatai-v1-instruct-q4km.gguf`

Format: Q4_K_M (4-bit Quantized)
Size: ~4.58 GB (approximate, actual size may vary slightly)
Description: This is a highly optimized 4-bit quantized version, suitable for devices with limited VRAM (e.g., 8GB GPU VRAM). It offers a good balance between model size, performance, and minimal quality loss.

Original Model:

Sahabat-AI/llama3-8b-cpt-sahabatai-v1-instruct

How to Use:

Download the desired .gguf file and use it with llama.cpp, LM Studio, Ollama, or any other GGUF-compatible inference tool.

For llama.cpp CLI, you might use:

./main -m llama3-8b-cpt-sahabatai-v1-instruct-q4km.gguf -p "Write a story about a dragon." -n 128

Downloads last month: 43

GGUF

Model size

8.03B params

Architecture

llama

Hardware compatibility

Log In to view the estimation

16-bit

View +1 variant

Inference Providers NEW

Text Generation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support