Add image visual recognition output just like qwen 2.5 vl-32b instruct
Hi there,
Wouldn't it be OK if the devs can add image visual reasoning just like its predecessors like qvq-max or qwen2.5-vl-32b instruct? since many of this model's top competitors like gpt 4.1 or gemini 2.5 pro already has image visual reasoning + CoT reasoning?
There's some time gap between Qwen2.5 and Qwen2.5-VL. I think they could made one for the Qwen3 Family
then we hope they should instead of just adding a seperate model for that, the devs should be able to merge the image visual feature for qwen 3 family of models.
You may need to read the Qwen-VL technical report.
You may need to read the Qwen-VL technical report.
please put your qwen-vl technical report here
Qwen 3 might already be native multimodal, it's accepting images on their website and the tokenizer also has image tokens.
Perhaps the vision encoder is just not ready yet.