SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion Paper β’ 2503.11576 β’ Published Mar 14 β’ 111 β’ 16
Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis Paper β’ 2502.04128 β’ Published Feb 6 β’ 26 β’ 4