Qwen
/

Text Generation
Transformers
Safetensors
qwen3_moe
conversational

image support

#9
by kuliev-vitaly - opened

According to blog on github qwen 3 support text, image, video and audio as input. According to model card it support only text as input. Does it support image as input? How to start model with image adapter?

https://chat.qwen.ai/
This model works with images in qwen chat.

I came to ask the same question. Looks like released model is only text generation, while online supports multimodality.

Is this model—the open-weights one—trained for handling those inputs? If so, could we use an adapter or additional encoder with it?

So no way to upload images to this model locally? What kind of nonsense is this? Open sourced ? Even qwq does not support images?

I guess qwen chat has proprietary image encoder? It will be great if they share this part.
Qwen3 models in qwen chat solves my usecase very well (involves image).

Hope they share this soon.

Better use devstral from unsloth it has vision, but its for coding. First vision + coding model, set temp to 0.05

Sign up or log in to comment