Update pipeline tag to audio-text-to-text

by nielsr HF Staff - opened Jul 29, 2025

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

-5

nielsr

Jul 29, 2025

This PR updates the pipeline_tag for the model card from audio-to-audio to audio-text-to-text.

The model is described as an "Interleaved Speech-Text Language Model" that can generate "speech or text continuations over discrete Hubert tokens given speech-text prompts." This indicates that it processes both speech and text as input and can generate both speech (via vocoding Hubert tokens) and text as output. The audio-text-to-text pipeline tag accurately reflects this multi-modal input and output capability, improving the model's discoverability and categorization on the Hugging Face Hub.

Update pipeline tag to audio-text-to-text80eabb9c

gallilmaimon

SLP-RL HUJI org Jul 31, 2025

Hey, perhaps I am mis-understanding but audio-text-to-text indicates that the model only outputs text while in reality in can output speech as well. It would be nice to indicate that it also handles text but I could not find a suitable tag.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Ready to merge

This branch is ready to get merged automatically.

· Sign up or log in to comment