microsoft/Phi-4-multimodal-instruct Automatic Speech Recognition β’ Updated 4 days ago β’ 767k β’ 1.23k
Running 543 543 Vision Arena (Testing VLMs side-by-side) πΌ Analyze images to detect and label objects