Attention Prompting on Image for Large Vision-Language Models Paper • 2409.17143 • Published Sep 25, 2024 • 7
Mugs: A Multi-Granular Self-Supervised Learning Framework Paper • 2203.14415 • Published Mar 27, 2022
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet Paper • 2101.11986 • Published Jan 28, 2021
ConvBERT: Improving BERT with Span-based Dynamic Convolution Paper • 2008.02496 • Published Aug 6, 2020
Emerging Properties in Unified Multimodal Pretraining Paper • 2505.14683 • Published May 20 • 130
MedAgent-Pro: Towards Multi-modal Evidence-based Medical Diagnosis via Reasoning Agentic Workflow Paper • 2503.18968 • Published Mar 21 • 6
SusGen-GPT: A Data-Centric LLM for Financial NLP and Sustainability Report Generation Paper • 2412.10906 • Published Dec 14, 2024 • 1
LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis Paper • 2503.21749 • Published Mar 27 • 26
MuMu-LLaMA: Multi-modal Music Understanding and Generation via Large Language Models Paper • 2412.06660 • Published Dec 9, 2024
Learning to Animate Images from A Few Videos to Portray Delicate Human Actions Paper • 2503.00276 • Published Mar 1
ScaleMAI: Accelerating the Development of Trusted Datasets and AI Models Paper • 2501.03410 • Published Jan 6
SurgRAW: Multi-Agent Workflow with Chain-of-Thought Reasoning for Surgical Intelligence Paper • 2503.10265 • Published Mar 13
Dynamic Pseudo Label Optimization in Point-Supervised Nuclei Segmentation Paper • 2406.16427 • Published Jun 24, 2024
CoT-Valve: Length-Compressible Chain-of-Thought Tuning Paper • 2502.09601 • Published Feb 13 • 14
ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation Paper • 2312.13108 • Published Dec 20, 2023 • 3
The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use Paper • 2411.10323 • Published Nov 15, 2024 • 35