Qwen
/

Text Generation
Transformers
Safetensors
qwen3_moe
conversational

Waiting for the Qwen3-VL

#8
by Maverick17 - opened

πŸ‘€πŸ‘€πŸ‘€

https://qwen3.org/vl/

That's not the official site

What!?

I just wish Qwen 3 would be a native multimodal now in this day and age, just like competitors like Gemma 3 and Llama 4 are. There's no need for seperate models anymore, just pretrain the models on multiple modalities and make one model.

I wonder if the model is actually pretrained for multiple modalities, but the adapters/encoders are not released...

unless the Qwen team can magically encode images using tokens, that's just not possible

I just wish Qwen 3 would be a native multimodal now in this day and age, just like competitors like Gemma 3 and Llama 4 are. There's no need for seperate models anymore, just pretrain the models on multiple modalities and make one model.

I think the same. What is the purpose of a non multimodal model anyway? Coding and ChatBot, like for customer service, etc. are the only specialized domains, I would think of a usecase for text-only LLM... But other then that, why wasting time and money?

Sign up or log in to comment