Update pipeline tag to audio-text-to-text

#2
by nielsr HF Staff - opened

This PR updates the pipeline_tag metadata from audio-to-audio to audio-text-to-text. This change more accurately reflects the model's capabilities, as described in its paper abstract and model card, indicating its multimodal input (audio and text) and text generation capabilities. This ensures better discoverability for users searching for models with this specific functionality on the Hugging Face Hub.

SLP-RL HUJI org

Hey! I appreciate your suggestion. I was deliberating about this when deciding the tag, but I felt that it is important to highlight that model can generate speech (tokens) as output as well unlike some "Speech-aware LM" such as QwenAudio. Ideally the correct tag would arguably be audio-text-to-audio-text but such a tag does not exist to the best of my knowledge. Happy to hear your suggestions

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment