Try to install for ollama but got errors
#1
by
shilik
- opened
Error: llama runner process has terminated: GGML_ASSERT(type >= 0 && type < GGML_TYPE_COUNT) failed
what should be done to solve this problem?
@shilik here is how to run:
# Install build dependencies and cuda toolkit as needed
git clone https://github.com/ikawrakow/ik_llama.cpp
cd ik_llama.cpp
# Configure CUDA+CPU Backend
cmake -B ./build -DGGML_CUDA=ON -DGGML_BLAS=OFF
# Build
cmake --build ./build --config Release -j $(nproc)
# Confirm
./build/bin/llama-server --version
version: 3640 (93cd77b6)
built with cc (GCC) 14.2.1 20250128 for x86_64-pc-linux-gnu
# Download 13.7GB model
wget https://huggingface.co/ubergarm/gemma-3-27b-it-qat-GGUF/resolve/main/gemma-3-27b-it-qat-mix-iq3_k.gguf
# Run API Server on http://localhost:8080
./build/bin/llama-server \
--model /mnt/models/gemma-3-27b-it-qat-mix-iq3_k.gguf \
-ctk q4_0 -ctv q4_0 \
-fa \
-amb 512 \
-fmoe \
-c 8192 \
-ub 512 \
-ngl 99 \
--parallel 1 \
--threads 4 \
--host 127.0.0.1 \
--port 8080