Model Card for StarCoder2-LPO
This is the adapter of the StarCoder2 model trained using Localized Preference Optimization (LPO) on DiSCo, presented in the paper: "Teaching an Old LLM Secure Coding: Localized Preference Optimization on Distilled Preferences". The original arXiv preprint can be found at: https://arxiv.org/abs/2506.00419.
To use this model for downstream tasks, you need to merge it with the base model. Specifically, merge it to the model {"bigcode/starcoder2-7b" + StarCoder2-SFT}
(where StarCoder2-SFT
is an adapter for bigcode/starcoder2-7b
).
The associated code repository can be found here: https://github.com/StonyBrookNLP/disco-lpo.
Abstract
LLM generated code often contains security issues. We address two key challenges in improving secure code generation. First, obtaining high quality training data covering a broad set of security issues is critical. To address this, we introduce a method for distilling a preference dataset of insecure and secure code pairs from frontier LLMs, along with a security reasoning that explains the issues and the fix. The key idea here is to make use of security knowledge sources to devise a systematic prompting strategy that ensures broad coverage. Second, aligning models to secure code requires focusing on localized regions of code. Direct preference optimization methods, like SimPO, are not designed to handle these localized differences and turn out to be ineffective. We address this with a new localized preference optimization algorithm that masks the security related tokens in both the winning (secure) and losing (insecure) responses. To prevent loss in code quality, we also add a regularizer. Evaluations show that both training on our dataset, DiSCo, and the new preference optimization algorithm, LPO, yield substantial reductions in code insecurity while also improving overall code quality. Code and dataset are available at this https URL.
Sample Usage (Inference)
This model is a PEFT adapter. To use it for code generation, you must first merge your chosen base model (e.g., bigcode/starcoder2-7b
) with the StarCoder2-SFT
adapter. Then, you can apply this StarCoder2-LPO
adapter on top of the merged SFT model for inference.
The GitHub repository provides an inference.py
script for generation. You can use a command similar to the following, after setting up your environment and merging the SFT adapter:
First, install the required libraries:
pip install -r requirements.txt # From the GitHub repository
Then, run the inference command (example from the GitHub README):
python inference.py --base_model models/starcoder2-sft-merged \
--adapter True \
--peft_model models/starcoder2-lpo \
--test_path datasets/security_eval.csv \
--output_path results/starcoder2_lpo.csv \
--parses 5 \
--T 0.4 \
--max_new_tokens 512 \
--batch_size 4
Please refer to the GitHub repository for detailed instructions on merging models and running inference.
Citation
Please include the following citation if you are using resources provided in this work:
@article{saqib2025teaching,
title={Teaching an Old LLM Secure Coding: Localized Preference Optimization on Distilled Preferences},
author={Saqib, Mohammad and Chakraborty, Saikat and Karmaker, Santu and Balasubramanian, Niranjan},
journal={arXiv preprint arXiv:2506.00419},
year={2025}
}
- Downloads last month
- 11
Model tree for StonyBrookNLP/StarCoder2-LPO
Base model
bigcode/starcoder2-7b