Oooowi's picture

2 15 3

Oooowi

ZiruiZheng

·

zhengzirui

AI & ML interests

None yet

Recent Activity

upvoted a paper 15 days ago

RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics

liked a model 2 months ago

Hzzone/GLIGEN_COCO

upvoted a paper 3 months ago

ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement

View all activity

Organizations

ZiruiZheng's activity

upvoted a paper 15 days ago

RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics

Paper • 2506.04308 • Published 16 days ago • 39

upvoted 3 papers 3 months ago

ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement

Paper • 2504.01934 • Published Apr 2 • 23

Towards Physically Plausible Video Generation via VLM Planning

Paper • 2503.23368 • Published Mar 30 • 40

AMD-Hummingbird: Towards an Efficient Text-to-Video Model

Paper • 2503.18559 • Published Mar 24 • 5

upvoted 2 papers 4 months ago

UniTok: A Unified Tokenizer for Visual Generation and Understanding

Paper • 2502.20321 • Published Feb 27 • 30

CineMaster: A 3D-Aware and Controllable Framework for Cinematic Text-to-Video Generation

Paper • 2502.08639 • Published Feb 12 • 43

upvoted a paper 6 months ago

Autoregressive Video Generation without Vector Quantization

Paper • 2412.14169 • Published Dec 18, 2024 • 14

upvoted 2 papers 7 months ago

Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration

Paper • 2411.17686 • Published Nov 26, 2024 • 21

Star Attention: Efficient LLM Inference over Long Sequences

Paper • 2411.17116 • Published Nov 26, 2024 • 55

upvoted a paper 9 months ago

Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation

Paper • 2409.04410 • Published Sep 6, 2024 • 26

upvoted a paper 10 months ago

Diffusion Feedback Helps CLIP See Better

Paper • 2407.20171 • Published Jul 29, 2024 • 37

upvoted 4 papers 11 months ago

Tora: Trajectory-oriented Diffusion Transformer for Video Generation

Paper • 2407.21705 • Published Jul 31, 2024 • 28

FreeLong: Training-Free Long Video Generation with SpectralBlend Temporal Attention

Paper • 2407.19918 • Published Jul 29, 2024 • 52

T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation

Paper • 2407.14505 • Published Jul 19, 2024 • 27

MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence

Paper • 2407.16655 • Published Jul 23, 2024 • 31