AtAndDev's picture
Upload README.md with huggingface_hub
9d01a29 verified
metadata
language:
  - ar
  - be
  - bg
  - bn
  - cs
  - cy
  - da
  - de
  - el
  - en
  - es
  - et
  - fa
  - fi
  - fr
  - gl
  - hi
  - hu
  - it
  - ja
  - ka
  - lt
  - lv
  - mk
  - mr
  - nl
  - pl
  - pt
  - ro
  - ru
  - sk
  - sl
  - sr
  - sv
  - sw
  - ta
  - th
  - tr
  - uk
  - ur
  - vi
  - zh
library_name: transformers
license: mit
metrics:
  - bleu
pipeline_tag: audio-text-to-text

Test ultravox model. More coming soon... I hope so.

import transformers
import numpy as np
import librosa

pipe = transformers.pipeline(model='AtAndDev/UVOX-5k-Llama-3.2-1B-Instruct', trust_remote_code=True)

path = "voice_input.mp3"
audio, sr = librosa.load(path, sr=16000)

turns = []
pipe({'audio': audio, 'turns': turns, 'sampling_rate': sr}, max_new_tokens=100)