Try to install for ollama but got errors

#1
by shilik - opened

Error: llama runner process has terminated: GGML_ASSERT(type >= 0 && type < GGML_TYPE_COUNT) failed

what should be done to solve this problem?

@shilik here is how to run:

# Install build dependencies and cuda toolkit as needed
git clone https://github.com/ikawrakow/ik_llama.cpp
cd ik_llama.cpp

# Configure CUDA+CPU Backend
cmake -B ./build -DGGML_CUDA=ON -DGGML_BLAS=OFF

# Build
cmake --build ./build --config Release -j $(nproc)

# Confirm
./build/bin/llama-server --version
version: 3640 (93cd77b6)
built with cc (GCC) 14.2.1 20250128 for x86_64-pc-linux-gnu

# Download 13.7GB model
wget https://huggingface.co/ubergarm/gemma-3-27b-it-qat-GGUF/resolve/main/gemma-3-27b-it-qat-mix-iq3_k.gguf

# Run API Server on http://localhost:8080
./build/bin/llama-server \
    --model /mnt/models/gemma-3-27b-it-qat-mix-iq3_k.gguf \
    -ctk q4_0 -ctv q4_0 \
    -fa \
    -amb 512 \
    -fmoe \
    -c 8192 \
    -ub 512 \
    -ngl 99 \
    --parallel 1 \
    --threads 4 \
    --host 127.0.0.1 \
    --port 8080

Sign up or log in to comment