Unified Visual Relationship Detection with Vision and Language Models Paper • 2303.08998 • Published Mar 16, 2023
The iNaturalist Species Classification and Detection Dataset Paper • 1707.06642 • Published Jul 20, 2017
Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception Paper • 2305.06324 • Published May 10, 2023 • 1
Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset Paper • 2004.12276 • Published Apr 26, 2020 • 1
Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation Paper • 2012.07177 • Published Dec 13, 2020
DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model Paper • 2306.01736 • Published Jun 2, 2023 • 1
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation Paper • 2104.13921 • Published Apr 28, 2021
VideoGLUE: Video General Understanding Evaluation of Foundation Models Paper • 2307.03166 • Published Jul 6, 2023 • 5
A Simple Zero-shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models Paper • 2302.06235 • Published Feb 13, 2023
Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models Paper • 2411.07126 • Published Nov 11, 2024 • 30
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning Paper • 2503.15558 • Published 6 days ago • 35
Atlas: Multi-Scale Attention Improves Long Context Image Modeling Paper • 2503.12355 • Published 9 days ago • 11
Wolf: Captioning Everything with a World Summarization Framework Paper • 2407.18908 • Published Jul 26, 2024 • 32
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation Paper • 2404.19752 • Published Apr 30, 2024 • 24
Bootstrapping Objectness from Videos by Relaxed Common Fate and Visual Grouping Paper • 2304.08025 • Published Apr 17, 2023 • 2
LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models Paper • 2305.13655 • Published May 23, 2023 • 7