-
Distilling Vision-Language Models on Millions of Videos
Paper • 2401.06129 • Published • 17 -
Koala: Key frame-conditioned long video-LLM
Paper • 2404.04346 • Published • 6 -
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
Paper • 2404.05726 • Published • 21 -
OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding
Paper • 2406.07471 • Published • 1
liu
che111
AI & ML interests
None yet
Recent Activity
authored
a paper
1 day ago
Efficient Large Language Models: A Survey
authored
a paper
1 day ago
Electrocardiogram Instruction Tuning for Report Generation
authored
a paper
1 day ago
BIMCV-R: A Landmark Dataset for 3D CT Text-Image Retrieval
Organizations
Collections
8
-
VoCo-LLaMA: Towards Vision Compression with Large Language Models
Paper • 2406.12275 • Published • 31 -
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models
Paper • 2405.15738 • Published • 46 -
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Paper • 2408.08872 • Published • 99
models
None public yet
datasets
None public yet