Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning Paper • 2505.03318 • Published 4 days ago • 81
Pleias-RAG Collection New generation of small reasoning models for RAG, search, and source summarization. • 4 items • Updated 15 days ago • 26
Describe Anything Collection Multimodal Large Language Models for Detailed Localized Image and Video Captioning • 7 items • Updated 4 days ago • 49
WebDreamer Collection Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents • 6 items • Updated 25 days ago • 4
DeTikZify: Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ Paper • 2405.15306 • Published May 24, 2024 • 7
DeTikZify Collection Synthesizing Graphics Programs for Scientific Figures and Sketches with TikZ • 12 items • Updated Mar 19 • 26
AudioX: Diffusion Transformer for Anything-to-Audio Generation Paper • 2503.10522 • Published Mar 13 • 25
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion Paper • 2503.11576 • Published Mar 14 • 104
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs Paper • 2503.01743 • Published Mar 3 • 87
Qwen2.5-Omni Collection End-to-End Omni (text, audio, image, video, and natural speech interaction) model based Qwen2.5 • 5 items • Updated 9 days ago • 107
view article Article Train 400x faster Static Embedding Models with Sentence Transformers Jan 15 • 178