MuSL

This model is a fine-tuned C-to-CUDA Translator of Qwen/Qwen3-0.6B. More details: Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Results

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 4
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 8
total_train_batch_size: 32
total_eval_batch_size: 64
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3.0

Framework versions

Transformers 4.51.3
Pytorch 2.6.0+cu124
Datasets 2.16.0
Tokenizers 0.21.1

Citation Information

@misc{ke2025musl,
      title={Mutual-Supervised Learning for Sequential-to-Parallel Code Translation}, 
      author={Changxin Ke and Rui Zhang and Shuo Wang and Li Ding and Guangli Li and Yuanbo Wen and Shuoming Zhang and Ruiyuan Xu and Jin Qin and Jiaming Guo and Chenxi Wang and Ling Li and Qi Guo and Yunji Chen},
      year={2025},
      eprint={2506.11153},
      archivePrefix={arXiv},
      primaryClass={cs.SE},
      url={https://arxiv.org/abs/2506.11153}, 
}