ComfyUI-R1: Exploring Reasoning Models for Workflow Generation Paper β’ 2506.09790 β’ Published about 21 hours ago β’ 16
Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation Paper β’ 2506.09350 β’ Published 1 day ago β’ 23
STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis Paper β’ 2506.06276 β’ Published 6 days ago β’ 18
ComfyUI-Copilot: An Intelligent Assistant for Automated Workflow Development Paper β’ 2506.05010 β’ Published 7 days ago β’ 62
SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis Paper β’ 2506.02096 β’ Published 10 days ago β’ 51
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning Paper β’ 2505.23380 β’ Published 14 days ago β’ 23
Alchemist Collection π Dataset and π checkpoints for paper π "Alchemist: Turning Public Text-to-Image Data into Generative Gold" β’ 7 items β’ Updated 16 days ago β’ 14
Alchemist: Turning Public Text-to-Image Data into Generative Gold Paper β’ 2505.19297 β’ Published 18 days ago β’ 74
LoRAShop: Training-Free Multi-Concept Image Generation and Editing with Rectified Flow Transformers Paper β’ 2505.23758 β’ Published 14 days ago β’ 23
Towards Safety Reasoning in LLMs: AI-agentic Deliberation for Policy-embedded CoT Data Creation Paper β’ 2505.21784 β’ Published 16 days ago β’ 18
VidText: Towards Comprehensive Evaluation for Video Text Understanding Paper β’ 2505.22810 β’ Published 15 days ago β’ 20
ZeroGUI: Automating Online GUI Learning at Zero Human Cost Paper β’ 2505.23762 β’ Published 14 days ago β’ 45
AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views Paper β’ 2505.23716 β’ Published 14 days ago β’ 31
Jack of all Trades Models Collection Home of the Personality Engine series, models that can be molded to fit any task or purpose. β’ 2 items β’ Updated 20 days ago β’ 5
MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios Paper β’ 2505.21333 β’ Published 16 days ago β’ 39
OpenThinkIMG Collection OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images. β’ 5 items β’ Updated 17 days ago β’ 3