Audio-to-Audio
Transformers
Safetensors
English
speech_language_model

Update pipeline tag to audio-text-to-text

#1
by nielsr HF Staff - opened

This PR updates the pipeline_tag for the model card from audio-to-audio to audio-text-to-text.

The model is described as an "Interleaved Speech-Text Language Model" that can generate "speech or text continuations over discrete Hubert tokens given speech-text prompts." This indicates that it processes both speech and text as input and can generate both speech (via vocoding Hubert tokens) and text as output. The audio-text-to-text pipeline tag accurately reflects this multi-modal input and output capability, improving the model's discoverability and categorization on the Hugging Face Hub.

SLP-RL HUJI org

Hey, perhaps I am mis-understanding but audio-text-to-text indicates that the model only outputs text while in reality in can output speech as well. It would be nice to indicate that it also handles text but I could not find a suitable tag.

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment