Regression Language Models for Code (RLMs)
We study code-to-metric regression: predicting numeric outcomes of code executions, a challenging task due to the open-ended nature of programming languages. While prior methods have resorted to heavy and domain-specific feature engineering, we show that a single unified Regression Language Model (RLM) can simultaneously predict directly from text, (i) the memory footprint of code across multiple high-level languages such as Python and C++, (ii) the latency of Triton GPU kernels, and (iii) the accuracy and speed of trained neural networks represented in ONNX. In particular, a relatively small 300M parameter RLM initialized from T5Gemma, obtains > 0.9 Spearman-rank on competitive programming submissions from APPS, and a single unified model achieves > 0.5 average Spearman-rank across 17 separate languages from CodeNet. Furthermore, the RLM can obtain the highest average Kendall-Tau of 0.46 on five classic NAS design spaces previously dominated by graph neural networks, and simultaneously predict architecture latencies on numerous hardware platforms.
Link for Code-Regression dataset: https://huggingface.co/datasets/akhauriyash/Code-Regression
Link for Graph-Regression dataset: https://huggingface.co/datasets/akhauriyash/GraphArch-Regression
Testing Code-Regression with a basic Gemma RLM model
Use the code below as reference for evaluating a basic RegressLM model ( better, more models to come! :) )
We strongly recommend transformers==4.53.2
for compatibility, though latest transformers should work as well.
import torch
import numpy as np
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from scipy.stats import spearmanr
from tqdm import tqdm
REPO_ID = "akhauriyash/RLM-GemmaS-Code-v0"
DATASET = "akhauriyash/Code-Regression"
dataset = load_dataset(DATASET, split="train")
tok = AutoTokenizer.from_pretrained(REPO_ID, trust_remote_code=True)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = AutoModelForSeq2SeqLM.from_pretrained(REPO_ID, trust_remote_code=True).to(device).eval()
MAX_ITEMS, BATCH_SIZE, spaces, results = 512, 16, ["KBSS", "CDSS", "APPS"], {}
language = None # Specify language for CDSS, e.g. "python"
n_out_tokens = getattr(model.config, "num_tokens_per_obj", 8) * getattr(model.config, "max_num_objs", 1)
n_out_tokens = model.config.num_tokens_per_obj * model.config.max_num_objs
for SPACE in spaces:
inputs, targets = [], []
for row in tqdm(dataset, desc=f"Processing {SPACE} till {MAX_ITEMS} items"):
if row.get("space") == SPACE and "input" in row and "target" in row:
try:
lang = eval(row['metadata'])['language'] if SPACE == "CDSS" else None
if SPACE != "CDSS" or language is None or lang == language:
targets.append(float(row["target"]))
if SPACE == "CDSS":
inputs.append(f"# {SPACE}\n# Language: {lang}\n{row['input']}")
else:
inputs.append(f"{SPACE}\n{row['input']}")
except: continue
if len(inputs) >= MAX_ITEMS: break
preds = []
for i in tqdm(range(0, len(inputs), BATCH_SIZE)):
enc = tok(inputs[i:i+BATCH_SIZE], return_tensors="pt", truncation=True, padding=True, max_length=2048).to(device)
batch_preds = []
for _ in range(8):
out = model.generate(**enc, max_new_tokens=n_out_tokens, min_new_tokens=n_out_tokens, do_sample=True, top_p=0.95, temperature=1.0)
decoded = [tok.token_ids_to_floats(seq.tolist()) for seq in out]
decoded = [d[0] if isinstance(d, list) and d else float("nan") for d in decoded]
batch_preds.append(decoded)
preds.extend(torch.tensor(batch_preds).median(dim=0).values.tolist())
spear, _ = spearmanr(np.array(targets), np.array(preds))
results[SPACE] = spear; print(f"Spearman ρ for {SPACE}: {spear:.3f}")
print("Spearman ρ | KBSS | CDSS | APPS")
print(f"{REPO_ID} | " + " | ".join(f"{results[s]:.3f}" for s in spaces))
We got the following results when testing on a random subset of the Code-Regression dataset.
Model ID | KBSS | CDSS | APPS
akhauriyash/RegressLM-gemma-s-RLM-table3 | 0.527 | 0.787 | 0.926
Citations
If you found this model or datasets attached useful for your research, please cite us:
@article{akhauri2025regressionlanguagemodelscode,
title={Regression Language Models for Code},
author={Yash Akhauri and Xingyou Song and Arissa Wongpanich and Bryan Lewandowski and Mohamed S. Abdelfattah},
journal={arXiv preprint arXiv:2509.26476},
year={2025}
}
@article{akhauri2025performance,
title={Performance Prediction for Large Systems via Text-to-Text Regression},
author={Akhauri, Yash and Lewandowski, Bryan and Lin, Cheng-Hsi and Reyes, Adrian N and Forbes, Grant C and Wongpanich, Arissa and Yang, Bangding and Abdelfattah, Mohamed S and Perel, Sagi and Song, Xingyou},
journal={arXiv preprint arXiv:2506.21718},
year={2025}
}
- Downloads last month
- 64
Model tree for akhauriyash/RLM-GemmaS-Code-v0
Unable to build the model tree, the base model loops to the model itself. Learn more.