Malaysian-Audio-Qwen2.5-7B-Instruct
Speech model on top of mesolitica/Malaysian-Qwen2.5-7B-Instruct.
How we trained it?
We trained 2 stages,
First stage
Speech understanding, this is to introduce speech dataset to the LLM.
- We use freezed Whisper Large V3 Encoder without any pooling, means 30 seconds audio consumed 1500 tokens.
- Projection, Embedding LM Head layers are done in full parameter finetuning.
- LoRA for other linear layers with rank 64 and alpha 128.
- Training done in multipacking with 8192 context length.
- WanDB at https://wandb.ai/huseinzol05/lora-embedding-64-audio-qwen2.5-7b-malaysian-8k
Dataset
- mesolitica/AudioSet-Audio-Instruction, 1 epoch.
- mesolitica/Classification-Speech-Instructions, 1 epoch.
- datasets/mesolitica/Animal-Sound-Instructions, 3 epoch.
- mesolitica/Transcription-Instructions, 1 epoch.
- mesolitica/Speaker-Diarization-Instructions, 4 epoch.
- mesolitica/Speech-Translation-Instructions, 2 epoch.
- mesolitica/CoVoST2-Instructions, 1 epoch.
- mesolitica/MusicBench-Instructions, 2 epoch.
- mesolitica/Classification-Speech-Adversarial-Instructions, 1 epoch.
- mesolitica/AudioSet-Audio-Adversarial-Instructions, 1 epoch.
- mesolitica/Sampling-Multitask-National-Speech-Corpus-v1, 1 epoch.
- mesolitica/Malaysian-Speech-Description-Timestamp-Instructions, 1 epoch.
- mesolitica/Cantonese-Radio-Description-Instructions, 1 epoch.
- mesolitica/Emilia-Mandarin-Description-Instructions, 1 epoch.
- mesolitica/Zeroshot-Audio-Classification-Instructions, 1 epoch.
With total 6.71B tokens or 25557.47 audio hours.
Second stage
Speech QA, actual conversations related to coding, politics, chat assistant and general QA.
- We use freezed Whisper Large V3 Encoder without any pooling, means 30 seconds audio consumed 1500 tokens.
- Projection, Embedding and LM Head layers are done in full parameter finetuning.
- Training done in multipacking with 10240 context length.
- LoRA for other linear layers with rank 64 and alpha 128.
Dataset
- Downloads last month
- 0
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for mesolitica/Malaysian-Audio-Qwen2.5-7B-Instruct
Base model
mesolitica/Malaysian-Qwen2.5-7B-Instruct