image support
#9
by
kuliev-vitaly
- opened
According to blog on github qwen 3 support text, image, video and audio as input. According to model card it support only text as input. Does it support image as input? How to start model with image adapter?
I came to ask the same question. Looks like released model is only text generation, while online supports multimodality.
Is this model—the open-weights one—trained for handling those inputs? If so, could we use an adapter or additional encoder with it?