Multimodal model?
This Model is tagged as Image-Text-to-Text. But it doesnt seem to have multimodal capabilitys. atleast not in ollama. Do i make anything wrong here or is multimodality not working now? Do you plan to implement multimodal capabilitys to this model?
In the description they make mention specifically that it was the text portion not the image portion that was abliterated. I, like you, figured that it meant that the de-layering of safety's were not conducted on the Visual component but after some testing I can only assume it means that it might have been pruned from the model. This is something all Image-to-text models huihui abliterated models seem to have in the description. Curious to know if it is the process, time being saved, or a limitation of the abiteration process they are using. Could possibly be elective as I have noticed they steer clear of any image generation models. Or could just boil down to it being more trouble that it is worth. Big AI is already delaying releases due to safety removal if there were Image and video generating ones everywhere they would lose revenue fast.
The ollama version was created using the ollama create
command and should not be able to process images. It seems that https://ollama.com/library/gemma3n also does not support images?
thanks for the information.