Triton Kernel Code Generation Model

This model is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct specialized for generating Triton GPU kernels.

Model Details

  • Base Model: Qwen/Qwen2.5-1.5B-Instruct
  • Fine-tuned on: 6000 examples of Triton kernel code
  • Eval Loss: 0.20
  • Eval Perplexity: 1.22

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("cdreetz/kwen2.5-1.5b")
tokenizer = AutoTokenizer.from_pretrained("cdreetz/kwen2.5-1.5b")

prompt = "Write a Triton kernel for element-wise addition:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=512)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Training Details

  • Epochs: 2
  • Batch Size: 2
  • Learning Rate: 1e-5
  • Dataset Size: 6000 examples

Performance

The model generates syntactically correct Triton kernels with proper:

  • @triton.jit decorators
  • Kernel function signatures
  • Launch function implementations
  • Memory access patterns
  • Grid configurations

Limitations

  • Specialized for Triton kernel generation only
  • May require prompt engineering for optimal results
  • Generated kernels should be tested before production use
Downloads last month
160
Safetensors
Model size
1.54B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for cdreetz/kwen2.5-1.5b

Base model

Qwen/Qwen2.5-1.5B
Finetuned
(942)
this model