DewEfresh/pixtral-12b-8bit · Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same

I get this error when I load the model using the latest transformers: Input type (torch.cuda.FloatTensor) and weight type (torch.cuda.HalfTensor) should be the same

I am just loading it using

model = LlavaForConditionalGeneration.from_pretrained(vlm_path,
    use_safetensors=True)

I get this error on this quantized version but not the full pixtral-12b repo. I did however copy the tokenizer_config.json from that repo into this one in order to use the tokenizer.

edit: It fails even with the example in #1.
edit2: I got it to work on this ANCIENT transformers fork branch mentioned here: https://huggingface.co/DewEfresh/pixtral-12b-8bit/discussions/1#66f1a38916c5478fa68c05d6

Unfortunately it is broken on the modern transformers package :(