mesolitica
/

Malaysian-Audio-Qwen2.5-7B-Instruct

Model card Files Files and versions

Malaysian-Audio-Qwen2.5-7B-Instruct

Speech model on top of mesolitica/Malaysian-Qwen2.5-7B-Instruct.

How we trained it?

We trained 2 stages,

First stage

Speech understanding, this is to introduce speech dataset to the LLM.

We use freezed Whisper Large V3 Encoder without any pooling, means 30 seconds audio consumed 1500 tokens.
Projection, Embedding LM Head layers are done in full parameter finetuning.
LoRA for other linear layers with rank 64 and alpha 128.
Training done in multipacking with 8192 context length.
WanDB at https://wandb.ai/huseinzol05/lora-embedding-64-audio-qwen2.5-7b-malaysian-8k

Dataset

mesolitica/AudioSet-Audio-Instruction, 1 epoch.
mesolitica/Classification-Speech-Instructions, 1 epoch.
datasets/mesolitica/Animal-Sound-Instructions, 3 epoch.
mesolitica/Transcription-Instructions, 1 epoch.
mesolitica/Speaker-Diarization-Instructions, 4 epoch.
mesolitica/Speech-Translation-Instructions, 2 epoch.
mesolitica/CoVoST2-Instructions, 1 epoch.
mesolitica/MusicBench-Instructions, 2 epoch.
mesolitica/Classification-Speech-Adversarial-Instructions, 1 epoch.
mesolitica/AudioSet-Audio-Adversarial-Instructions, 1 epoch.
mesolitica/Sampling-Multitask-National-Speech-Corpus-v1, 1 epoch.
mesolitica/Malaysian-Speech-Description-Timestamp-Instructions, 1 epoch.
mesolitica/Cantonese-Radio-Description-Instructions, 1 epoch.
mesolitica/Emilia-Mandarin-Description-Instructions, 1 epoch.
mesolitica/Zeroshot-Audio-Classification-Instructions, 1 epoch.

With total 6.71B tokens or 25557.47 audio hours.

Second stage

Speech QA, actual conversations related to coding, politics, chat assistant and general QA.

We use freezed Whisper Large V3 Encoder without any pooling, means 30 seconds audio consumed 1500 tokens.
Projection, Embedding and LM Head layers are done in full parameter finetuning.
Training done in multipacking with 10240 context length.
LoRA for other linear layers with rank 64 and alpha 128.

Dataset

mesolitica/Malaysian-UltraChat-Speech-Multiturn-Instructions, 1 epoch.
mesolitica/Malaysian-Speech-Instructions, 1 epoch.
mesolitica/Malaysian-Reasoning-Speech-Instructions, 1 epoch.

Downloads last month: 0

Safetensors

Model size

8.25B params

Tensor type

BF16

·

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mesolitica/Malaysian-Audio-Qwen2.5-7B-Instruct

Base model

mesolitica/Malaysian-Qwen2.5-7B-Instruct

Finetuned

(2)

this model