1 22 11

zhanghang

hangzhang-nlp

hangzhang-nlp

AI & ML interests

None yet

Recent Activity

liked a Space about 1 month ago

lixin4ever/VideoRefer-VideoLLaMA3

upvoted a paper about 2 months ago

Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning

upvoted a paper 4 months ago

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

View all activity

Organizations

upvoted a paper about 2 months ago

Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning

Paper • 2506.07044 • Published Jun 8 • 110

upvoted a paper 4 months ago

InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models

Paper • 2504.10479 • Published Apr 14 • 280

upvoted 3 papers 5 months ago

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Paper • 2406.07476 • Published Jun 11, 2024 • 38

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19 • 199

LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization

Paper • 2502.13922 • Published Feb 19 • 28

upvoted 6 papers 6 months ago

Scaling Pre-training to One Hundred Billion Data for Vision Language Models

Paper • 2502.07617 • Published Feb 11 • 29

Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment

Paper • 2502.04328 • Published Feb 6 • 30

Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback

Paper • 2501.12895 • Published Jan 22 • 62

VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding

Paper • 2501.13106 • Published Jan 22 • 90

MMVU: Measuring Expert-Level Multi-Discipline Video Understanding

Paper • 2501.12380 • Published Jan 21 • 86

Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models

Paper • 2501.11873 • Published Jan 21 • 66

upvoted 2 papers 7 months ago

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

Paper • 2501.00599 • Published Dec 31, 2024 • 48

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

Paper • 2501.00958 • Published Jan 1 • 107

upvoted a paper 8 months ago

Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 373

upvoted a paper 9 months ago

Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss

Paper • 2410.17243 • Published Oct 22, 2024 • 95

upvoted a paper 10 months ago

The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio

Paper • 2410.12787 • Published Oct 16, 2024 • 32

upvoted a paper about 1 year ago

SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages

Paper • 2407.19672 • Published Jul 29, 2024 • 59

upvoted an article over 1 year ago

Article

Introducing Idefics2: A Powerful 8B Vision-Language Model for the community

and 2 others •

Apr 15, 2024

• 184

upvoted a collection over 1 year ago

WizardLM

Collection

0 items • Updated Apr 17 • 108

upvoted a paper over 1 year ago

Audio Dialogues: Dialogues dataset for audio and music understanding

Paper • 2404.07616 • Published Apr 11, 2024 • 16

zhanghang

AI & ML interests

Recent Activity

Organizations

hangzhang-nlp's activity

Introducing Idefics2: A Powerful 8B Vision-Language Model for the community