Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models Paper • 2503.16257 • Published 4 days ago • 22
xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs Paper • 2410.16267 • Published Oct 21, 2024 • 18
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations Paper • 2408.12590 • Published Aug 22, 2024 • 36 • 5
xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations Paper • 2408.12590 • Published Aug 22, 2024 • 36
XGen-MM-1 models and datasets Collection A collection of all XGen-MM (Foundation LMM) models! • 18 items • Updated Feb 18 • 38
Salesforce/xgen-mm-phi3-mini-instruct-interleave-r-v1.5 Image-Text-to-Text • Updated Feb 3 • 6.31k • 51
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models Paper • 2408.08872 • Published Aug 16, 2024 • 99