llava-hf/llava-onevision-qwen2-0.5b-ov-hf

Just dipping my toes into this insanity and I am having issues.

I tried running this example code to test the speed

from transformers import pipeline

pipe = pipeline("image-text-to-text", model="llava-hf/llava-onevision-qwen2-0.5b-ov-hf")
messages = [
{
"role": "user",
"content": [
{"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg"},
{"type": "text", "text": "What does the label 15 represent? (1) lava (2) core (3) tunnel (4) ash cloud"},
],
},
]

out = pipe(text=messages, max_new_tokens=20)
print(out)

For what ever reason, pipelines is not accepting the "image-text-to-text" parameter. See the following error:

KeyError: "Unknown task image-text-to-text, available tasks are ['audio-classification', 'automatic-speech-recognition', 'depth-estimation', 'document-question-answering', 'feature-extraction', 'fill-mask', 'image-classification', 'image-feature-extraction', 'image-segmentation', 'image-to-image', 'image-to-text', 'mask-generation', 'ner', 'object-detection', 'question-answering', 'sentiment-analysis', 'summarization', 'table-question-answering', 'text-classification', 'text-generation', 'text-to-audio', 'text-to-speech', 'text2text-generation', 'token-classification', 'translation', 'video-classification', 'visual-question-answering', 'vqa', 'zero-shot-audio-classification', 'zero-shot-classification', 'zero-shot-image-classification', 'zero-shot-object-detection', 'translation_XX_to_YY']"

I am running this model on my machine using WSL and creating a python virtual environment so I have clean dependencies. The version of python I am using is 3.8 and the version of transformers is 4.46. I believe I have installed all other dependencies for this model but I am not sure now.

I feel like this is a stupid mistake that I just can't see. Any help would be greatly appreciated.

Thanks!

llava-hf
/

llava-onevision-qwen2-0.5b-ov-hf

What am I doing wrong?