Update pipeline tag to audio-text-to-text
This PR updates the pipeline_tag
for the model card from audio-to-audio
to audio-text-to-text
.
The model is described as an "Interleaved Speech-Text Language Model" that can generate "speech or text continuations over discrete Hubert tokens given speech-text prompts." This indicates that it processes both speech and text as input and can generate both speech (via vocoding Hubert tokens) and text as output. The audio-text-to-text
pipeline tag accurately reflects this multi-modal input and output capability, improving the model's discoverability and categorization on the Hugging Face Hub.
Hey, perhaps I am mis-understanding but audio-text-to-text
indicates that the model only outputs text while in reality in can output speech as well. It would be nice to indicate that it also handles text but I could not find a suitable tag.