Language Models that takes vision input and/or audio input, hand picked by Nexa Team.
-
NexaAI/gemma-3n-E4B-it-4bit-MLX
Image-Text-to-Text • Updated • 190 • 1 -
NexaAI/Qwen2.5-VL-7B-Instruct-4bit-MLX
Image-Text-to-Text • 2B • Updated • 121 -
NexaAI/SmolVLM-500M-Instruct-8bit-MLX
Image-Text-to-Text • 0.7B • Updated • 52 -
NexaAI/SmolVLM-Instruct-8bit-MLX
Image-Text-to-Text • 0.7B • Updated • 71