CudaLLM: A Language Model for High-Performance CUDA Kernel Generation

Model Description

cudaLLM-8B is a language model for generating high-performance and syntactically correct CUDA kernels. It is based on the Qwen3-8B model and has undergone a two-stage training process to master the complexities of parallel programming for GPUs.

Performance on KernelBench:

Bo1 Bo2 Bo4 Bo8 Bo16
Level-1 79.75 83 84 86 87
Level-2 67.30 70 71 72 73
Level-3 20.83 26 30 34 36

Training Procedure

The model was trained using the verl library. The model was trained and evaluated on:

  • SFT Dataset: A high-quality dataset of CUDA problem-solution pairs (sft_cuda_llm_r1.parquet), originally generated by DeepSeek R1, DeepSeel Coder-7B, and Qwen2-32B.
  • RL Dataset: A refined dataset (rl_cuda_llm_0424.parquet) used to provide performance-based rewards during the RL stage.
  • Evaluation Dataset: The model's performance was benchmarked against the KernelBench dataset.

Intended Use and Limitations

Intended Use

The primary use of CudaLLM is to assist developers in writing and optimizing high-performance CUDA kernels. It can be used for:

  • Accelerating scientific computing and machine learning workloads.
  • As a co-pilot or productivity tool for HPC and CUDA developers.
  • Research into AI-driven code generation and optimization.

Limitations and Bias

  • Correctness is Not Guaranteed: While trained to produce correct code, the model's output should always be rigorously tested and verified before deployment in production systems.
  • Security Risks: The generated code is not guaranteed to be secure. Never run model-generated code from an untrusted source without careful inspection.
  • Performance Variability: Kernel performance can vary significantly depending on the target GPU architecture, input data sizes, and compiler version. The generated code may require further manual tuning.
  • Specialized Domain: This model is highly specialized for CUDA code generation. Its performance on general-purpose programming tasks or natural language conversation will be limited.
Downloads last month
32
Safetensors
Model size
8.19B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ByteDance-Seed/cudaLLM-8B

Base model

Qwen/Qwen3-8B-Base
Finetuned
Qwen/Qwen3-8B
Finetuned
(202)
this model
Quantizations
1 model

Dataset used to train ByteDance-Seed/cudaLLM-8B