MuSL

This model is a fine-tuned C-to-CUDA Translator of Qwen/Qwen3-0.6B. More details: Mutual-Supervised Learning for Sequential-to-Parallel Code Translation

Results

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 32
  • total_eval_batch_size: 64
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3.0

Framework versions

  • Transformers 4.51.3
  • Pytorch 2.6.0+cu124
  • Datasets 2.16.0
  • Tokenizers 0.21.1

Citation Information

@misc{ke2025musl,
      title={Mutual-Supervised Learning for Sequential-to-Parallel Code Translation}, 
      author={Changxin Ke and Rui Zhang and Shuo Wang and Li Ding and Guangli Li and Yuanbo Wen and Shuoming Zhang and Ruiyuan Xu and Jin Qin and Jiaming Guo and Chenxi Wang and Ling Li and Qi Guo and Yunji Chen},
      year={2025},
      eprint={2506.11153},
      archivePrefix={arXiv},
      primaryClass={cs.SE},
      url={https://arxiv.org/abs/2506.11153}, 
}
Downloads last month
9
Safetensors
Model size
596M params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 3 Ask for provider support

Model tree for kcxain/translator-Qwen3-0.6B

Finetuned
Qwen/Qwen3-0.6B
Finetuned
(311)
this model

Collection including kcxain/translator-Qwen3-0.6B