Waiting for the Qwen3-VL
πππ
What!?
I just wish Qwen 3 would be a native multimodal now in this day and age, just like competitors like Gemma 3 and Llama 4 are. There's no need for seperate models anymore, just pretrain the models on multiple modalities and make one model.
I wonder if the model is actually pretrained for multiple modalities, but the adapters/encoders are not released...
unless the Qwen team can magically encode images using tokens, that's just not possible
I just wish Qwen 3 would be a native multimodal now in this day and age, just like competitors like Gemma 3 and Llama 4 are. There's no need for seperate models anymore, just pretrain the models on multiple modalities and make one model.
I think the same. What is the purpose of a non multimodal model anyway? Coding and ChatBot, like for customer service, etc. are the only specialized domains, I would think of a usecase for text-only LLM... But other then that, why wasting time and money?