Merve Noyan

merve

AI & ML interests

VLMs, vision & co

Articles

Organizations

Posts 71

view post
Post
608
OmniVision-968M: a new local VLM for edge devices, fast & small but performant
💨 a new vision language model with 9x less image tokens, super efficient
📖 aligned with DPO for reducing hallucinations
⚡️ Apache 2.0 license 🔥

Demo hf.co/spaces/NexaAIDev/omnivlm-dpo-demo
Model NexaAIDev/omnivision-968M
view post
Post
1712
Amazing past days at open ML, it's raining coding models, let's have a recap 🌧️ Find all models and datasets here merve/nov-15-releases-67372d0ebdc354756a52ecd0

Models
💻 Coding: Qwen team released two Qwen2.5-Coder checkpoints of 32B and 7B. Infly released OpenCoder: 1.5B and 8B coding models with instruction SFT'd versions and their datasets! 💗

🖼️ Image/Video Gen: Alibaba vision lab released In-context LoRA -- 10 LoRA models on different themes based on Flux. Also Mochi the sota video generation model with A2.0 license now comes natively supported in diffusers 👏

🖼️ VLMs/Multimodal: NexaAIDev released Omnivision 968M a new vision language model aligned with DPO for reducing hallucinations, also comes with GGUF ckpts 👏 Microsoft released LLM2CLIP, a new CLIP-like model with longer context window allowing complex text inputs and better search

🎮 AGI?: Etched released Oasis 500M, a diffusion based open world model that takes keyboard input and outputs gameplay 🤯

Datasets
Common Corpus: A text dataset with 2T tokens with permissive license for EN/FR on various sources: code, science, finance, culture 📖