Ran Xu's picture

Ran Xu

xurantju

·

AI & ML interests

None yet

Organizations

authored 10 papers 2 months ago

xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations

Paper • 2408.12590 • Published Aug 22, 2024 • 37

xLAM: A Family of Large Action Models to Empower AI Agent Systems

Paper • 2409.03215 • Published Sep 5, 2024 • 5

SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant

Paper • 2403.11299 • Published Mar 17, 2024 • 1

LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer

Paper • 2212.09877 • Published Dec 19, 2022

Trust but Verify: Programmatic VLM Evaluation in the Wild

Paper • 2410.13121 • Published Oct 17, 2024 • 3

xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs

Paper • 2410.16267 • Published Oct 21, 2024 • 18

BLIP3-KALE: Knowledge Augmented Large-Scale Dense Captions

Paper • 2411.07461 • Published Nov 12, 2024 • 24

ProVision: Programmatically Scaling Vision-centric Instruction Data for Multimodal Language Models

Paper • 2412.07012 • Published Dec 9, 2024 • 1

DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs

Paper • 2504.17040 • Published Apr 23 • 13

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Paper • 2505.09568 • Published May 14 • 95

authored 10 papers 11 months ago

Deformer: Dynamic Fusion Transformer for Robust Hand Pose Estimation

Paper • 2303.04991 • Published Mar 9, 2023

X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning

Paper • 2311.18799 • Published Nov 30, 2023 • 1

TrustLLM: Trustworthiness in Large Language Models

Paper • 2401.05561 • Published Jan 10, 2024 • 70

REX: Rapid Exploration and eXploitation for AI Agents

Paper • 2307.08962 • Published Jul 18, 2023 • 1

Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization

Paper • 2308.02151 • Published Aug 4, 2023 • 20

BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents

Paper • 2308.05960 • Published Aug 11, 2023 • 19

FOFO: A Benchmark to Evaluate LLMs' Format-Following Capability

Paper • 2402.18667 • Published Feb 28, 2024

ULIP: Learning a Unified Representation of Language, Images, and Point Clouds for 3D Understanding

Paper • 2212.05171 • Published Dec 10, 2022

MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens

Paper • 2406.11271 • Published Jun 17, 2024 • 21

MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases

Paper • 2406.10290 • Published Jun 12, 2024