PHI4
#9
by
mans0987
- opened
How does this model compare with PHI4 multi-modal?
interesting question, they are developed from different objectives. It is hard to compare apple to apple.
can you please elaborate? when should I use Phi4 and when should I use this model (assuming that I am only interested in text and Vison and not audio which Phi4 has but this model doesn't have)?
If you are only interested in text and vision, Magma is good at spatial understanding and reasoning for multimodal inputs, but phi4 is better at reading texts from the images based on my rough glimpse.
I am looking to detect if a house image shows the entry to that house, what is your suggestion?
I think you can try both for this task.