DaoanZhang's picture

DaoanZhang

DwanZhang

·

AI & ML interests

None yet

Recent Activity

updated a model 1 day ago

LumosLens-2/data_store

published a model 1 day ago

LumosLens-2/data_store

upvoted a collection about 1 month ago

View all activity

Organizations

upvoted a collection about 1 month ago

DeepSeek-V4

4 items • Updated Apr 24 • 661

upvoted 2 papers 5 months ago

JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation

Paper • 2512.22905 • Published Dec 28, 2025 • 20

StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors

Paper • 2512.16915 • Published Dec 18, 2025 • 38

upvoted 4 papers 6 months ago

OPV: Outcome-based Process Verifier for Efficient Long Chain-of-Thought Verification

Paper • 2512.10756 • Published Dec 11, 2025 • 35

Unified Video Editing with Temporal Reasoner

Paper • 2512.07469 • Published Dec 8, 2025 • 46

UltraFlux: Data-Model Co-Design for High-quality Native 4K Text-to-Image Generation across Diverse Aspect Ratios

Paper • 2511.18050 • Published Nov 22, 2025 • 38

VIDEOP2R: Video Understanding from Perception to Reasoning

Paper • 2511.11113 • Published Nov 14, 2025 • 113

upvoted a paper 7 months ago

UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist

Paper • 2511.08521 • Published Nov 11, 2025 • 39

upvoted a paper 10 months ago

MaPPO: Maximum a Posteriori Preference Optimization with Prior Knowledge

Paper • 2507.21183 • Published Jul 27, 2025 • 15

upvoted a paper 12 months ago

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14, 2025 • 311

upvoted 5 papers about 1 year ago

Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models

Paper • 2411.14432 • Published Nov 21, 2024 • 25

Emerging Properties in Unified Multimodal Pretraining

Paper • 2505.14683 • Published May 20, 2025 • 134

On Path to Multimodal Generalist: General-Level and General-Bench

Paper • 2505.04620 • Published May 7, 2025 • 83

WorldGenBench: A World-Knowledge-Integrated Benchmark for Reasoning-Driven Text-to-Image Generation

Paper • 2505.01490 • Published May 2, 2025 • 5

Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation

Paper • 2503.24379 • Published Mar 31, 2025 • 76

upvoted an article about 1 year ago

Article

Open-source DeepResearch – Freeing our search agents

+3

m-ric, albertvillanova, merve, thomwolf, clefourrier

•

Feb 4, 2025

• 1.32k

upvoted 2 papers over 1 year ago

SePPO: Semi-Policy Preference Optimization for Diffusion Alignment

Paper • 2410.05255 • Published Oct 7, 2024 • 5

DNAGPT: A Generalized Pre-trained Tool for Versatile DNA Sequence Analysis Tasks

Paper • 2307.05628 • Published Jul 11, 2023 • 10

upvoted a paper about 2 years ago

Emo-Avatar: Efficient Monocular Video Style Avatar through Texture Rendering

Paper • 2402.00827 • Published Feb 1, 2024 • 2

upvoted a collection about 2 years ago

LLaVa-NeXT

LLaVa-NeXT (also known as LLaVa-1.6) improves upon the 1.5 series by incorporating higher image resolutions and more reasoning/OCR datasets. • 8 items • Updated Jul 19, 2024 • 34