Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective Paper • 2410.12490 • Published Oct 16, 2024 • 8
Addressing Representation Collapse in Vector Quantized Models with One Linear Layer Paper • 2411.02038 • Published Nov 4, 2024
Difformer: Empowering Diffusion Models on the Embedding Space for Text Generation Paper • 2212.09412 • Published Dec 19, 2022 • 1
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs Paper • 2406.07476 • Published Jun 11, 2024 • 38
Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction Paper • 2406.12707 • Published Jun 18, 2024
Sequence-to-Action: Grammatical Error Correction with Action Guided Sequence Generation Paper • 2205.10884 • Published May 22, 2022
Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA Paper • 2304.01603 • Published Apr 4, 2023
DiffS2UT: A Semantic Preserving Diffusion Model for Textless Direct Speech-to-Speech Translation Paper • 2310.17570 • Published Oct 26, 2023
Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer Paper • 2406.00976 • Published Jun 3, 2024