Model Card for Model ID

⚠️ This is a temporary repository for our [EMNLP 2025] demo paper submission.
The project is currently hosted here for review and demonstration purposes.
It will be migrated to the official organization repository once it becomes available.

All code, models, and documentation are maintained here until then.

Github: LMT

Model Details

Model Description

BiMaTE (Bi-Centric Machine Translation Expert) is a large-scale, LLM-based, Chinese-English-Centric multilingual translation model designed to facilitate high-quality translation between Chinese, English, and numerous other global languages.

Model type: Causal Language Model for Machine Translation
Languages: 60
Translation directions: 234
Base Model: Qwen3-8B-Base
Training Strategy:
1. Monolingual Continual Pretraining (CPT): 30B tokens
2. Mixed Continual Pretraining (CPT): 60B tokens (monolingual, bilingual)
3. Supervised Finetuning (SFT): Post-training on smaller-scale, high-quality translation data.

Quickstart

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "luoyingfeng/BiMaTE-8B"

tokenizer = AutoTokenizer.from_pretrained(model_name, padding_side='left')
model = AutoModelForCausalLM.from_pretrained(model_name)

prompt = "Translate the following text from English into Chinese.\nEnglish: The concept came from China where plum blossoms were the flower of choice.\nChinese: "
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(**model_inputs, max_new_tokens=512, num_beams=5, do_sample=False)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

outputs = tokenizer.batch_decode(output_ids, skip_special_tokens=True)

print("response:", outputs)

Support Languages

Resource Tier	Languages
High-resource Languages (13)	Arabic(ar), English(en), Spanish(es), German(de), French(fr), Italian(it), Japanese(ja), Dutch(nl), Polish(pl), Portuguese(pt), Russian(ru), Turkish(tr), Chinese(zh)
Medium-resource Languages (18)	Bulgarian(bg), Bengali(bn), Czech(cs), Danish(da), Modern Greek(el), Persian(fa), Finnish(fi), Hindi(hi), Hungarian(hu), Indonesian(id), Korean(ko), Norwegian(no), Romanian(ro), Slovak(sk), Swedish(sv), Thai(th), Ukrainian(uk), Vietnamese(vi)
Low-resouce Languages (29)	Amharic(am), Azerbaijani(az), Tibetan(bo), Modern Hebrew(he), Croatian(hr), Armenian(hy), Icelandic(is), Javanese(jv), Georgian(ka), Kazakh(kk), Central Khmer(km), Kirghiz(ky), Lao(lo), Mongolian(mn), Marathi(mr), Malay(ms), Burmese(my), Nepali(ne), Pashto(ps), Sinhala(si), Swahili(sw), Tamil(ta), Telugu(te), Tajik(tg), Tagalog(tl), Uighur(ug), Urdu(ur), Uzbek(uz), Yue Chinese(yue)

luoyingfeng
/

BiMaTE-8B

Model Card for Model ID

Model Details

Model Description

Quickstart

Support Languages

Model tree for luoyingfeng/BiMaTE-8B