Llama-SEA-LION-TH-audio-preview
Llama-SEA-LION-TH-audio-preview is a πΉπ Thai audio-language model designed to natively support both text and audio inputs, with text output. This is a research preview, the result of a collaborative effort between SCB10X and AI Singapore. The model is built on top of aisingapore/Llama-SEA-LION-v3-8B-IT, a powerful instruction-tuned language model for Southeast Asian languages.
Model Description
- Model type: The LLM is based on Llama-SEA-LION-v3-8B-IT, and the audio encoder is based on Whisper's encoder and BEATs.
- Requirement: transformers 4.45.0
- Primary Language(s): Thai πΉπ and English π¬π§
- License: Llama 3 Community License
Usage Example
from transformers import AutoModel
import soundfile as sf
import librosa
# Initialize from the trained model
model = AutoModel.from_pretrained(
"",
torch_dtype=torch.float16,
trust_remote_code=True
)
model.to("cuda")
model.eval()
# read a wav file (it needs to be in 16 kHz and clipped to 30 seconds)
audio, sr = sf.read("path_to_your_audio.wav")
if len(audio.shape) == 2:
audio = audio[:, 0]
if len(audio) > 30 * sr:
audio = audio[: 30 * sr]
if sr != 16000:
audio = librosa.resample(audio, orig_sr=sr, target_sr=16000, res_type="fft")
# Run generation
prompt_pattern="<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\n<Speech><SpeechHere></Speech> {}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
response = model.generate(
audio=audio,
prompt="transcribe this audio",
prompt_pattern=prompt_pattern,
do_sample=False,
max_new_tokens=512,
repetition_penalty=1.1,
num_beams=1,
# temperature=0.4,
# top_p=0.9,
)
print(response)
Generation Parameters:
- audio -- audio input, e.g., using
soundfile.read
orlibrosa.resample
to read a wav file like the example above - prompt (
str
) -- Text input to the model - prompt_pattern (
str
) -- Chat template that is augmented with special tokens, and it must be set the same as one during training - max_new_tokens (
int
, optional, defaults to 1024) - num_beams (
int
, optional, defaults to 4) - do_sample (
bool
, optional, defaults to True) - top_p (
float
, optional, defaults to 0.9) - repetition_penalty (
float
, optional, defaults to 1.0), - length_penalty (
float
, optional, defaults to 1.0), - temperature (
float
, optional, defaults to 1.0),
This is also model.generate_stream()
for streaming generation. Please refer to modeling_typhoonaudio.py
for this function.
Intended Uses & Limitations
This model is experimental and may not always follow human instructions accurately, making it prone to generating hallucinations. Additionally, the model lacks moderation mechanisms and may produce harmful or inappropriate responses. Developers should carefully assess potential risks based on their specific applications.
Acknowledgements
This work builds upon the foundations laid by scb10x/llama-3-typhoon-v1.5-8b-audio-preview and its accompanying technical report, which we closely followed.
- Downloads last month
- 16