Llama 3 8B Sahabat-AI Instruct (GGUF Versions)
This repository contains GGUF converted and quantized versions of the Sahabat-AI/llama3-8b-cpt-sahabatai-v1-instruct model, converted using llama.cpp
.
This model is an instruction-tuned variant, suitable for chat and following commands.
Available GGUF Files:
1. llama3-8b-cpt-sahabatai-v1-instruct-f16.gguf
- Format: FP16 (Full Precision)
- Size: ~16.1 GB
- Description: This is the full-precision GGUF conversion. It offers the highest fidelity but requires significant VRAM (approx. 16 GB).
2. llama3-8b-cpt-sahabatai-v1-instruct-q4km.gguf
- Format: Q4_K_M (4-bit Quantized)
- Size: ~4.58 GB (approximate, actual size may vary slightly)
- Description: This is a highly optimized 4-bit quantized version, suitable for devices with limited VRAM (e.g., 8GB GPU VRAM). It offers a good balance between model size, performance, and minimal quality loss.
Original Model:
How to Use:
Download the desired .gguf
file and use it with llama.cpp
, LM Studio, Ollama, or any other GGUF-compatible inference tool.
For llama.cpp
CLI, you might use:
./main -m llama3-8b-cpt-sahabatai-v1-instruct-q4km.gguf -p "Write a story about a dragon." -n 128
- Downloads last month
- 43
Hardware compatibility
Log In
to view the estimation
16-bit
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support