Update pipeline tag to audio-text-to-text

by nielsr HF Staff - opened Jul 29, 2025

base: refs/heads/main

←

from: refs/pr/2

Discussion Files changed

-5

nielsr

Jul 29, 2025

This PR updates the pipeline_tag metadata from audio-to-audio to audio-text-to-text. This change more accurately reflects the model's capabilities, as described in its paper abstract and model card, indicating its multimodal input (audio and text) and text generation capabilities. This ensures better discoverability for users searching for models with this specific functionality on the Hugging Face Hub.

Update pipeline tag to audio-text-to-text74f9c683

gallilmaimon

SLP-RL HUJI org Jul 31, 2025

Hey! I appreciate your suggestion. I was deliberating about this when deciding the tag, but I felt that it is important to highlight that model can generate speech (tokens) as output as well unlike some "Speech-aware LM" such as QwenAudio. Ideally the correct tag would arguably be audio-text-to-audio-text but such a tag does not exist to the best of my knowledge. Happy to hear your suggestions

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment