Edit model card

mrcuddle/Lumimaid-v0.2-12B-Q4_K_M-GGUF

This model was converted to GGUF format from NeverSleep/Lumimaid-v0.2-12B using llama.cpp via Convert Model to GGUF.

Key Features:

  • Quantized for reduced file size (GGUF format)
  • Optimized for use with llama.cpp
  • Compatible with llama-server for efficient serving

Refer to the original model card for more details on the base model.

Usage with llama.cpp

1. Install llama.cpp:

brew install llama.cpp  # For macOS/Linux

2. Run Inference:

CLI:

llama-cli --hf-repo mrcuddle/Lumimaid-v0.2-12B-Q4_K_M-GGUF --hf-file lumimaid-v0.2-12b-q4_k_m.gguf -p "Your prompt here"

Server:

llama-server --hf-repo mrcuddle/Lumimaid-v0.2-12B-Q4_K_M-GGUF --hf-file lumimaid-v0.2-12b-q4_k_m.gguf -c 2048

For more advanced usage, refer to the llama.cpp repository.

Downloads last month
29
GGUF
Model size
12.2B params
Architecture
llama

4-bit

Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.