Multimodal Chaptering for Long-Form TV Newscast Video Paper β’ 2406.17590 β’ Published Mar 20, 2024 β’ 2
Towards Retrieval Augmented Generation over Large Video Libraries Paper β’ 2406.14938 β’ Published Jun 21, 2024 β’ 22
Inserting Faces inside Captions: Image Captioning with Attention Guided Merging Paper β’ 2405.02305 β’ Published Mar 20, 2024 β’ 2