Qwen3-8B-Translator-LoRA

This model is a fine-tuned version of Qwen/Qwen3-8B using LoRA for English to Chinese translation, specifically tailored for audio product terminology.

Fine-tuning Details

Fine-tuning Method: LoRA (Low-Rank Adaptation)
Dataset: Custom parallel corpus for audio products (English-Chinese)
Framework: PyTorch, Hugging Face Transformers, TRL, PEFT, Optimum TPU
Hardware: Google Cloud TPU v3-8

Training Procedure

The model was trained using the SFTTrainer from the TRL library.

Training Hyperparameters

max_seq_length: 1024
per_device_train_batch_size: 32
per_device_eval_batch_size: 32
num_train_epochs: 10
eval_strategy: "steps"
eval_steps: 10
learning_rate: 5e-5
lr_scheduler_type: "cosine"
warmup_ratio: 0.05
weight_decay: 0.005
optim: "adamw_torch_xla"

LoRA Configuration

r: 128
lora_alpha: 256
lora_dropout: 0.05
bias: "none"
target_modules: ["q_proj", "v_proj","o_proj", "gate_proj","up_proj","down_proj"]
modules_to_save: ["lm_head"]

Training Results

Step	Training Loss	Validation Loss
10	1.070300	0.866071
20	0.738300	0.653832
30	0.621100	0.534040
40	0.433600	0.462612
50	0.416000	0.423968
60	0.306600	0.405645
70	0.308600	0.396484
80	0.238300	0.380999
90	0.232400	0.376814
100	0.265600	0.371326
110	0.209000	0.366815
120	0.120100	0.400251
130	0.148400	0.417922
140	0.121600	0.411133

Intended Use

This model is intended for translating English text related to audio products into Chinese. It can be used by professionals in the audio industry, technical writers, or anyone needing to translate such content.

Limitations and Bias

The model's performance is best on text similar to the data it was trained on (audio product domain).
It may not generalize well to other domains or highly colloquial language.
As with any language model, there's a potential for biases present in the training data to be reflected in the output.

nananatsu
/

Qwen3-8B-Translator-LoRA