metadata
language:
- ar
- be
- bg
- bn
- cs
- cy
- da
- de
- el
- en
- es
- et
- fa
- fi
- fr
- gl
- hi
- hu
- it
- ja
- ka
- lt
- lv
- mk
- mr
- nl
- pl
- pt
- ro
- ru
- sk
- sl
- sr
- sv
- sw
- ta
- th
- tr
- uk
- ur
- vi
- zh
library_name: transformers
license: mit
metrics:
- bleu
pipeline_tag: audio-text-to-text
Test ultravox model. More coming soon... I hope so.
import transformers
import numpy as np
import librosa
pipe = transformers.pipeline(model='AtAndDev/UVOX-5k-Llama-3.2-1B-Instruct', trust_remote_code=True)
path = "voice_input.mp3"
audio, sr = librosa.load(path, sr=16000)
turns = []
pipe({'audio': audio, 'turns': turns, 'sampling_rate': sr}, max_new_tokens=100)