Kyu Song's picture

38 19

Kyu Song

kyunocap

·

AI & ML interests

None yet

Recent Activity

upvoted a paper 3 days ago

STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis

upvoted a paper 8 days ago

Temporal In-Context Fine-Tuning for Versatile Control of Video Diffusion Models

upvoted a paper 8 days ago

ATI: Any Trajectory Instruction for Controllable Video Generation

View all activity

Organizations

None yet

kyunocap's activity

upvoted a paper 3 days ago

STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis

Paper • 2506.06276 • Published 6 days ago • 18

upvoted 2 papers 8 days ago

Temporal In-Context Fine-Tuning for Versatile Control of Video Diffusion Models

Paper • 2506.00996 • Published 11 days ago • 35

ATI: Any Trajectory Instruction for Controllable Video Generation

Paper • 2505.22944 • Published 15 days ago • 7

upvoted a paper 30 days ago

Seed1.5-VL Technical Report

Paper • 2505.07062 • Published May 11 • 143

upvoted 2 papers about 1 month ago

Flow-GRPO: Training Flow Matching Models via Online RL

Paper • 2505.05470 • Published May 8 • 78

Towards Understanding Camera Motions in Any Video

Paper • 2504.15376 • Published Apr 21 • 157

upvoted 4 papers about 2 months ago

Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning

Paper • 2504.17192 • Published Apr 24 • 112

Packing Input Frame Context in Next-Frame Prediction Models for Video Generation

Paper • 2504.12626 • Published Apr 17 • 51

REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers

Paper • 2504.10483 • Published Apr 14 • 21

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14 • 271

upvoted 3 papers 3 months ago

Automated Movie Generation via Multi-Agent CoT Planning

Paper • 2503.07314 • Published Mar 10 • 45

Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders

Paper • 2503.03601 • Published Mar 5 • 234

Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

Paper • 2503.01743 • Published Mar 3 • 88

upvoted 7 papers 4 months ago

OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference

Paper • 2502.18411 • Published Feb 25 • 73

Phantom: Subject-consistent video generation via cross-modal alignment

Paper • 2502.11079 • Published Feb 16 • 60

On-device Sora: Enabling Diffusion-Based Text-to-Video Generation for Mobile Devices

Paper • 2502.04363 • Published Feb 5 • 12

Magic 1-For-1: Generating One Minute Video Clips within One Minute

Paper • 2502.07701 • Published Feb 11 • 36

ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features

Paper • 2502.04320 • Published Feb 6 • 38

DynVFX: Augmenting Real Videos with Dynamic Content

Paper • 2502.03621 • Published Feb 5 • 30

LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer

Paper • 2502.01105 • Published Feb 3 • 20