Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2503.16418

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 26
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 13
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 43
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 22

InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity

Paper • 2503.16418 • Published 3 days ago • 24

InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity

Paper • 2503.16418 • Published 3 days ago • 24

InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity

Paper • 2503.16418 • Published 3 days ago • 24

Models, tools and approaches that help in producing photorealistic images and videos using AI models

kudzueye/boreal-hl-v1

Text-to-Video • Updated Feb 10 • 118
tencent/HunyuanVideo

Text-to-Video • Updated 17 days ago • 3.6k • • 1.77k
XLabs-AI/flux-RealismLora

Text-to-Image • Updated Aug 22, 2024 • 216k • • 1.08k
hugovntr/flux-schnell-realism

Text-to-Image • Updated Sep 21, 2024 • 2.93k • • 23

checkitoutlater

MotionShop: Zero-Shot Motion Transfer in Video Diffusion Models with Mixture of Score Guidance

Paper • 2412.05355 • Published Dec 6, 2024 • 9
SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion

Paper • 2412.04301 • Published Dec 5, 2024 • 37
PanoDreamer: 3D Panorama Synthesis from a Single Image

Paper • 2412.04827 • Published Dec 6, 2024 • 11
Around the World in 80 Timesteps: A Generative Approach to Global Visual Geolocation

Paper • 2412.06781 • Published Dec 9, 2024 • 21

Image Generation

Running on Zero

7.9k

7.9k

FLUX.1 [dev]

🖥

Generate images from text prompts
InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity

Paper • 2503.16418 • Published 3 days ago • 24

about 6 hours ago

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

Paper • 2406.16860 • Published Jun 24, 2024 • 60
Understanding Alignment in Multimodal LLMs: A Comprehensive Study

Paper • 2407.02477 • Published Jul 2, 2024 • 23
LongVILA: Scaling Long-Context Visual Language Models for Long Videos

Paper • 2408.10188 • Published Aug 19, 2024 • 52
Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22, 2024 • 126

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs