VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning Paper • 2507.13348 • Published Jul 17 • 73
MiCo: Multi-image Contrast for Reinforcement Visual Reasoning Paper • 2506.22434 • Published Jun 27 • 10
Training-Free Efficient Video Generation via Dynamic Token Carving Paper • 2505.16864 • Published May 22 • 23
Direct Inversion: Boosting Diffusion-based Editing with 3 Lines of Code Paper • 2310.01506 • Published Oct 2, 2023
RL-GPT: Integrating Reinforcement Learning and Code-as-policy Paper • 2402.19299 • Published Feb 29, 2024 • 2
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models Paper • 2403.18814 • Published Mar 27, 2024 • 48
Multi-modal Cooking Workflow Construction for Food Recipes Paper • 2008.09151 • Published Aug 20, 2020 • 1
VisionZip: Longer is Better but Not Necessary in Vision Language Models Paper • 2412.04467 • Published Dec 5, 2024 • 119
crumb/bloom-560m-RLHF-SD2-prompter-aesthetic Text Generation • 0.6B • Updated Mar 19, 2023 • 1.17k • 20