image support
According to blog on github qwen 3 support text, image, video and audio as input. According to model card it support only text as input. Does it support image as input? How to start model with image adapter?
I came to ask the same question. Looks like released model is only text generation, while online supports multimodality.
Is this model—the open-weights one—trained for handling those inputs? If so, could we use an adapter or additional encoder with it?
yes, wondering if there will be multimodal projector files to be shared later?
See as an example:
So no way to upload images to this model locally? What kind of nonsense is this? Open sourced ? Even qwq does not support images?
I guess qwen chat has proprietary image encoder? It will be great if they share this part.
Qwen3 models in qwen chat solves my usecase very well (involves image).
Hope they share this soon.
Better use devstral from unsloth it has vision, but its for coding. First vision + coding model, set temp to 0.05