DialGuide: Aligning Dialogue Model Behavior with Developer Guidelines Paper • 2212.10557 • Published Dec 20, 2022
BigDocs: An Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks Paper • 2412.04626 • Published Dec 5, 2024 • 14
UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction Paper • 2503.15661 • Published Mar 19 • 2
FM2DS: Few-Shot Multimodal Multihop Data Synthesis with Knowledge Distillation for Question Answering Paper • 2412.07030 • Published Dec 9, 2024
Augmenting LLM Reasoning with Dynamic Notes Writing for Complex QA Paper • 2505.16293 • Published May 22 • 2
Rendering-Aware Reinforcement Learning for Vector Graphics Generation Paper • 2505.20793 • Published May 27 • 11
StarFlow: Generating Structured Workflow Outputs From Sketch Images Paper • 2503.21889 • Published Mar 27 • 1
AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding Paper • 2502.01341 • Published Feb 3 • 39