Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
gary109 's Collections
DeepSeek
video segmentation
Generation 3D
Text-to-Audio
LLM
Prompting
Text-to-Image
Representations
Transformers
Robot
Vision Transformers
Diffusion Model
text-to-3D
Text-to-Video
ML
RLHF
Video 優化
Image Completion
Others
multimodal
Auto
Vision-Language
Application
Optimization
Cost
Semantic Segmentation
Video Generation
Code Generation
ASR
Generative
Whisper
AGI
Funny
music
SVC
Datasets
yolo
Watermarking
生成式AI導論 2024
Text-to-Embedding
RAG
image-to-3D
Music Captions
OCR
Audio

Music Captions

updated Aug 9, 2024
Upvote
-

  • Futga: Towards Fine-grained Music Understanding through Temporally-enhanced Generative Augmentation

    Paper • 2407.20445 • Published Jul 29, 2024 • 23

  • LP-MusicCaps: LLM-Based Pseudo Music Captioning

    Paper • 2307.16372 • Published Jul 31, 2023 • 38

  • The Song Describer Dataset: a Corpus of Audio Captions for Music-and-Language Evaluation

    Paper • 2311.10057 • Published Nov 16, 2023 • 1

  • MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models

    Paper • 2408.01337 • Published Aug 2, 2024 • 12

  • Facing the Music: Tackling Singing Voice Separation in Cinematic Audio Source Separation

    Paper • 2408.03588 • Published Aug 7, 2024 • 7
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs