PHI4

#9
by mans0987 - opened

How does this model compare with PHI4 multi-modal?

Microsoft org

interesting question, they are developed from different objectives. It is hard to compare apple to apple.

can you please elaborate? when should I use Phi4 and when should I use this model (assuming that I am only interested in text and Vison and not audio which Phi4 has but this model doesn't have)?

Microsoft org

If you are only interested in text and vision, Magma is good at spatial understanding and reasoning for multimodal inputs, but phi4 is better at reading texts from the images based on my rough glimpse.

I am looking to detect if a house image shows the entry to that house, what is your suggestion?

Microsoft org

I think you can try both for this task.

Sign up or log in to comment