Unable to run in ollama due to error

by Khawn2u - opened Mar 20

Mar 20

I ran ollama pull hf.co/bartowski/nvidia_Llama-3_3-Nemotron-Super-49B-v1-GGUF:Q4_K_Mto download and install the model in ollama. When I then try to use the model it crashes my ollama with the following error:

panic: interface conversion: interface {} is *ggml.array, not uint32

goroutine 27 [running]:
github.com/ollama/ollama/fs/ggml.keyValue[...](0xc00010a570, {0x7ff67b6bf1a3, 0x14}, {0xc000624548, 0x1, 0x7ff67a55e960})
C:/a/ollama/ollama/fs/ggml/ggml.go:146 +0x2de
github.com/ollama/ollama/fs/ggml.KV.Uint(...)
C:/a/ollama/ollama/fs/ggml/ggml.go:96
github.com/ollama/ollama/fs/ggml.KV.HeadCount(...)
C:/a/ollama/ollama/fs/ggml/ggml.go:56
github.com/ollama/ollama/fs/ggml.GGML.GraphSize({{0x7ff67b874828?, 0xc000726000?}, {0x7ff67b8747d8?, 0xc00018d808?}}, 0x20000, 0x200, {0x0, 0x0})
C:/a/ollama/ollama/fs/ggml/ggml.go:418 +0x137
github.com/ollama/ollama/llm.EstimateGPULayers({_, _, _}, , {, _, _}, {{0x20000, 0x200, 0xffffffffffffffff, ...}, ...})
C:/a/ollama/ollama/llm/memory.go:140 +0x659
github.com/ollama/ollama/llm.PredictServerFit({0xc00004bba8?, 0x7ff67a540f2e?, 0xc00004b8c0?}, 0xc000350060, {0xc00004b908?, _, _}, {0x0, 0x0, 0x0}, ...)
C:/a/ollama/ollama/llm/memory.go:23 +0xbd
github.com/ollama/ollama/server.pickBestFullFitByLibrary(0xc000570000, 0xc000350060, {0xc000160600?, 0x2?, 0x2?}, 0xc00004bcf8)
C:/a/ollama/ollama/server/sched.go:714 +0x6f3
github.com/ollama/ollama/server.(*Scheduler).processPending(0xc00009a8a0, {0x7ff67b878800, 0xc000726ff0})
C:/a/ollama/ollama/server/sched.go:226 +0xe6b
github.com/ollama/ollama/server.(*Scheduler).Run.func1()
C:/a/ollama/ollama/server/sched.go:108 +0x1f
created by github.com/ollama/ollama/server.(*Scheduler).Run in goroutine 1
C:/a/ollama/ollama/server/sched.go:107 +0xb1

There was an update to ollama, but the update did not fix or change the error at all. I have also tried the other quants from DevQuasar, they have the same issue.

lightsoutallout

Mar 24

•

edited Mar 24

Also does not work in LM Studio 3.13.2 *latest

bartowski

Owner Mar 24

The lmstudio bug I think is only because of the chat template

For ollama that's trickier, would have to ping @ollama or maybe @reach-vb

Sorry didn't see this when you initially posted!

reach-vb

Mar 24

Looking into it

ngxson

Mar 25

The code that cause crash is here: https://github.com/ollama/ollama/blob/main/fs/ggml/ggml.go#L55

Seems like it's part of the code in go to determine how many layers should be offloaded to GPU. The problem is that in llama.cpp, we support 2 possible types for HeadCount:

A number, meaning all layers in the model have the same number of head
An array, meaning each layer in the model can have different number of head

The problem is that ollama only support the first option for now, while llama.cpp support all of the 2. I think we should open an issue on ollama

reach-vb

Mar 25

cc: @ollama for vis too 🤗

ngxson

Mar 25

Upstream issue: https://github.com/ollama/ollama/issues/9984

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment