DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception Paper • 2410.12628 • Published Oct 16 • 26
Granite 3.0 Language Models Collection A series of language models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 8 items • Updated 13 days ago • 87
An Object is Worth 64x64 Pixels: Generating 3D Object via Image Diffusion Paper • 2408.03178 • Published Aug 6 • 36
SaulLM-54B & SaulLM-141B: Scaling Up Domain Adaptation for the Legal Domain Paper • 2407.19584 • Published Jul 28 • 62
Llama 3.1 Collection This collection hosts the transformers and original repos of the Llama 3.1, Llama Guard 3 and Prompt Guard models • 11 items • Updated Sep 25 • 618
NNsight and NDIF: Democratizing Access to Foundation Model Internals Paper • 2407.14561 • Published Jul 18 • 34
GET-Zero: Graph Embodiment Transformer for Zero-shot Embodiment Generalization Paper • 2407.15002 • Published Jul 20 • 4
Cinemo: Consistent and Controllable Image Animation with Motion Diffusion Models Paper • 2407.15642 • Published Jul 22 • 10
MusiConGen: Rhythm and Chord Control for Transformer-Based Text-to-Music Generation Paper • 2407.15060 • Published Jul 21 • 9
Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning Paper • 2407.15762 • Published Jul 22 • 9
Artist: Aesthetically Controllable Text-Driven Stylization without Training Paper • 2407.15842 • Published Jul 22 • 13
HoloDreamer: Holistic 3D Panoramic World Generation from Text Descriptions Paper • 2407.15187 • Published Jul 21 • 10
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models Paper • 2407.15841 • Published Jul 22 • 39
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion Paper • 2407.01392 • Published Jul 1 • 39