MiDashengLM-7B-1021
Collection
4 items
โข
Updated
The bfloat16 (bf16) weights for mispeech/midashenglm-7b-1021-fp32.
Recommended for most general-purpose scenarios, including inference and fine-tuning. It delivers quality comparable to FP32 while being significantly faster on modern GPUs (e.g., A100, H100, RTX 4090). The original fp32 model is only for strict numerical reproduction of benchmark results.
from transformers import AutoModelForCausalLM, AutoProcessor, AutoTokenizer
model_id = "mispeech/midashenglm-7b-1021-fp16"
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_id)
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
user_prompt = "Caption the audio." # You may try any other prompt
messages = [
{
"role": "system",
"content": [
{"type": "text", "text": "You are a helpful language and speech assistant."}
],
},
{
"role": "user",
"content": [
{"type": "text", "text": user_prompt},
{
"type": "audio",
"path": "/path/to/example.wav",
# or "url": "https://example.com/example.wav"
# or "audio": np.random.randn(16000)
},
],
},
]
import torch
with torch.no_grad():
model_inputs = processor.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
add_special_tokens=True,
return_dict=True,
).to(device=model.device, dtype=model.dtype)
generation = model.generate(**model_inputs)
output = tokenizer.batch_decode(generation, skip_special_tokens=True) # ["An engine is idling."]
MiDashengLM is under the Apache License 2.0, and we encourage its use in both research and business applications.
If you find MiDashengLM useful in your research, please consider citing our work:
@techreport{midashenglm7b,
title = {MiDashengLM: Efficient Audio Understanding with General Audio Captions},
author = {{Horizon Team, MiLM Plus}},
institution= {Xiaomi Inc.},
year = {2025},
note = {Contributors: Heinrich Dinkel et al. (listed alphabetically in Appendix B)},
url = {https://arxiv.org/abs/2508.03983},
eprint = {2508.03983},
}
Base model
Qwen/Qwen2.5-Omni-7B