2025-01-09 does not run on the CPU with the example config

#50
by KeilahElla - opened

If I try to run the example code in the README on the CPU, I get the following errors:

RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'

It looks like it tries to run fp16 on the cpu, which pytorch still does not support. Overriding in the following way did not solve the issue:

model = AutoModelForCausalLM.from_pretrained(
model_id, trust_remote_code=True, revision=revision, torch_dtype=torch.float32
)

Any suggestions?

Update: there were a lot of float16 tensors in the inference code, which made it impossible to run the lateast version of moondream2 on the cpu.

I have replaced them with float32 types and it runs great now on the cpu. It's really fast even on my low-spec laptop.

If @vikhyatk is interested, I can send my code.

Sign up or log in to comment