Multimodals - a ckandemir Collection

ckandemir 's Collections

LLMs

Audio

Multimodals

updated Jan 15, 2024

The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)

Paper • 2309.17421 • Published Sep 29, 2023 • 4
Improved Baselines with Visual Instruction Tuning

Paper • 2310.03744 • Published Oct 5, 2023 • 39
Leveraging Unpaired Data for Vision-Language Generative Models via Cycle Consistency

Paper • 2310.03734 • Published Oct 5, 2023 • 15
Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation

Paper • 2310.08541 • Published Oct 12, 2023 • 18
GaussianDreamer: Fast Generation from Text to 3D Gaussian Splatting with Point Cloud Priors

Paper • 2310.08529 • Published Oct 12, 2023 • 18
Instruct-Imagen: Image Generation with Multi-modal Instruction

Paper • 2401.01952 • Published Jan 3, 2024 • 32
Story-to-Motion: Synthesizing Infinite and Controllable Character Animation from Long Text

Paper • 2311.07446 • Published Nov 13, 2023 • 29