Image input

by mvsoom - opened Mar 17

Mar 17

This looks very promising :) . One question: does it support image input (single image btw)? Did it catastrophically forget the visual modality due to finetuning on literature? Cheers

mvsoom

Mar 17

OK, I see that it is text to text from Gemma-2-9b-it model card.
Would you happen to know of models that are also low on slop but still have image multimodality?
Images are good sources of entropy for writing imho.
Cheers

sam-paech

Owner Mar 18

Ah, good question. tbh I haven't tested a lot of open models with image modality. There will be a lot of gemma 3 fine tunes appearing soon so that might be a good bet. The vanilla instruct gemma 3 can be really good but needs some prompting away from its default safe/slop style.

mvsoom

Mar 18

Alright, thanks :)

mvsoom changed discussion status to closed Mar 18

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment