Configuration Parsing Warning: In UNKNOWN_FILENAME: "auto_map.AutoTokenizer" must be a string

Regression Language Models for Code (RLMs)

We study code-to-metric regression: predicting numeric outcomes of code executions, a challenging task due to the open-ended nature of programming languages. While prior methods have resorted to heavy and domain-specific feature engineering, we show that a single unified Regression Language Model (RLM) can simultaneously predict directly from text, (i) the memory footprint of code across multiple high-level languages such as Python and C++, (ii) the latency of Triton GPU kernels, and (iii) the accuracy and speed of trained neural networks represented in ONNX. In particular, a relatively small 300M parameter RLM initialized from T5Gemma, obtains > 0.9 Spearman-rank on competitive programming submissions from APPS, and a single unified model achieves > 0.5 average Spearman-rank across 17 separate languages from CodeNet. Furthermore, the RLM can obtain the highest average Kendall-Tau of 0.46 on five classic NAS design spaces previously dominated by graph neural networks, and simultaneously predict architecture latencies on numerous hardware platforms.

Link for Code-Regression dataset: https://huggingface.co/datasets/akhauriyash/Code-Regression

Link for Graph-Regression dataset: https://huggingface.co/datasets/akhauriyash/GraphArch-Regression

Testing Code-Regression with a basic Gemma RLM model

Use the code below as reference for evaluating a basic RegressLM model ( better, more models to come! :) )

We strongly recommend transformers==4.53.2 for compatibility, though latest transformers should work as well.

import torch
import numpy as np
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from scipy.stats import spearmanr
from tqdm import tqdm

REPO_ID = "akhauriyash/RLM-GemmaS-Code-v0"
DATASET = "akhauriyash/Code-Regression"
dataset = load_dataset(DATASET, split="train")
tok = AutoTokenizer.from_pretrained(REPO_ID, trust_remote_code=True)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = AutoModelForSeq2SeqLM.from_pretrained(REPO_ID, trust_remote_code=True).to(device).eval()
MAX_ITEMS, BATCH_SIZE, spaces, results = 512, 16, ["KBSS", "CDSS", "APPS"], {}
language = None # Specify language for CDSS, e.g. "python"
n_out_tokens = getattr(model.config, "num_tokens_per_obj", 8) * getattr(model.config, "max_num_objs", 1)
n_out_tokens = model.config.num_tokens_per_obj * model.config.max_num_objs

for SPACE in spaces:
    inputs, targets = [], []
    for row in tqdm(dataset, desc=f"Processing {SPACE} till {MAX_ITEMS} items"):
        if row.get("space") == SPACE and "input" in row and "target" in row:
            try:
                lang = eval(row['metadata'])['language'] if SPACE == "CDSS" else None
                if SPACE != "CDSS" or language is None or lang == language:
                    targets.append(float(row["target"]))
                    if SPACE == "CDSS":
                        inputs.append(f"# {SPACE}\n# Language: {lang}\n{row['input']}")
                    else:
                        inputs.append(f"{SPACE}\n{row['input']}")
            except: continue
            if len(inputs) >= MAX_ITEMS: break
    preds = []
    for i in tqdm(range(0, len(inputs), BATCH_SIZE)):
        enc = tok(inputs[i:i+BATCH_SIZE], return_tensors="pt", truncation=True, padding=True, max_length=2048).to(device)
        batch_preds = []
        for _ in range(8):
            out = model.generate(**enc, max_new_tokens=n_out_tokens, min_new_tokens=n_out_tokens, do_sample=True, top_p=0.95, temperature=1.0)
            decoded = [tok.token_ids_to_floats(seq.tolist()) for seq in out]
            decoded = [d[0] if isinstance(d, list) and d else float("nan") for d in decoded]
            batch_preds.append(decoded)
        preds.extend(torch.tensor(batch_preds).median(dim=0).values.tolist())
    spear, _ = spearmanr(np.array(targets), np.array(preds))
    results[SPACE] = spear; print(f"Spearman ρ for {SPACE}: {spear:.3f}")

print("Spearman ρ | KBSS | CDSS | APPS")
print(f"{REPO_ID} | " + " | ".join(f"{results[s]:.3f}" for s in spaces))

We got the following results when testing on a random subset of the Code-Regression dataset.

Model ID                                 | KBSS  | CDSS  | APPS
akhauriyash/RegressLM-gemma-s-RLM-table3 | 0.527 | 0.787 | 0.926

Citations

If you found this model or datasets attached useful for your research, please cite us:

@article{akhauri2025regressionlanguagemodelscode,
      title={Regression Language Models for Code}, 
      author={Yash Akhauri and Xingyou Song and Arissa Wongpanich and Bryan Lewandowski and Mohamed S. Abdelfattah},
      journal={arXiv preprint arXiv:2509.26476},
      year={2025}
}

@article{akhauri2025performance,
  title={Performance Prediction for Large Systems via Text-to-Text Regression},
  author={Akhauri, Yash and Lewandowski, Bryan and Lin, Cheng-Hsi and Reyes, Adrian N and Forbes, Grant C and Wongpanich, Arissa and Yang, Bangding and Abdelfattah, Mohamed S and Perel, Sagi and Song, Xingyou},
  journal={arXiv preprint arXiv:2506.21718},
  year={2025}
}

Downloads last month: 64

Safetensors

Model size

0.2B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for akhauriyash/RLM-GemmaS-Code-v0

Unable to build the model tree, the base model loops to the model itself. Learn more.

akhauriyash
/

RLM-GemmaS-Code-v0

Regression Language Models for Code (RLMs)

Testing Code-Regression with a basic Gemma RLM model

Citations

Model tree for akhauriyash/RLM-GemmaS-Code-v0

Datasets used to train akhauriyash/RLM-GemmaS-Code-v0